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20.  Abstract 
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w_  In  this  dissertation-  a  procedure  for  estimating  the  parameters  of 
a  quantile  regression  funcSon  is  investigated.  The  procedure  is  based 
on  the  work  of  Parzen  (1979a)  in  the  theory  of  quantile  functions  and 
is  applicable  to  a  wide  range  of  distributional  families. 

The  procedure  assumes  the  quantile  functions  of  k  populations  to  be 
location-scale  shifts  of  a  common  quantile  function.  First,  a  goodness- 
of-fit  procedure  for  determining  the  common  distributional  shape  of  the 
k  populations  generalizes  the  one-population  data  modeling  techniques 
of  Parzen  (1979a).  An  estimator  of  the  shape  parameter  of  a  distri¬ 
bution  is  also  investigated.  The  methods  of  Ogawa  (1951)  and  Eubank 
(1979)  are  then  used  for  estimating  the  location  and  scale  parameters 
of  the  k  populations.  A  regression  model  for  the  location  and  scale 
parameters  is  specified,  and  the  resulting  estimators  of  the  regression 
parameters  are  used  to  determine  a  regression  function  for  any  quantiles 
of  the  observed  data.  Finally  it  is  shown  that  inferences  about  the 
quantile  relationships  can  be  based  on  the  asymptotic  normality  of  the 
estimated  parameters.  The  procedures  are  applied  to  some  published 
data  sets../ 
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1.  THE  PROBLEM  OF  K- SAMPLE  QUANTILE  REGRESSION 

The  technique  of  regression  analysis  is  used  to  model  the  rela¬ 
tionship  between  the  mean  of  a  response  variable  Y  and  a  predictor 
variable  X.  In  some  situations  it  may  be  more  useful  to  model  the 
relationship  between  the  percentiles  (or  quantiles)  of  a  response 
variable  Y  and  the  values  of  a  predictor  variable  X. 

Hogg  (1975),  Griffiths  and  Willcox  (1978),  and  Angers  (1979) 
investigate  the  relationship  between  several  percentiles  of  salary 
level  for  professors  at  a  major  university  as  a  function  of  their 
years  in  service.  Hogg  (1975)  uses  a  nonparametric  graphic  tech¬ 
nique  to  estimate  linear  percentile  relationships.  Griffiths  and 
Willcox  (1978)  use  a  maximum  likelihood  approach  based  on  assuming 
the  data  to  have  a  normal  distribution.  Angers  (1979)  adopts  a 
nonparametric  approach  using  linear  grafted  polynomials.  He  assumes 
that  a  specific  dependent  relationship  exists  among  the  various 
percentile  regression  curves. 

Reliability  and  survival  analyses  often  lead  to  situations 
where  one  is  interested  in  modeling  percentiles  of  the  survival 
distribution  as  a  function  of  the  treatment,  e.g.  modeling  the 
median  survival  time  of  fist)  as  a  function  of  water  temperature. 

Matis  and  Wehrly  (1979)  investigate  the  resistance  rf  the  green 
sunfish,  Lepomi s  ^yanelUis ,  to  various  levels  of  thermal  pollution 

Citations  follow  the  format  of  the  .lonrnal  of  the  American 
Statistical  Association. 
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using  a  compartmentnl  models  approach.  The  data  consist  of  survival 
times  of  fish  at  fixed  temperatures.  They  assume  the  data  to  have  a 
three-parameter  Weibull  distribution  and  estimate  all  three  parameters 
for  several  temperatures.  LaRiccia  (1979)  analyzes  the  same  data 
using  minimum  quantile  distance  estimators  of  the  parameters.  Our 
goal  is  to  estimate  the  relationship  between  the  percentiles  of  the 
survival  times  and  the  test  temperature.- 

Thus  we  consider  the  following  statistical  situation.  Consider 
random  samples  from  k(k  _>  2)  populations,  i.e.  for  i  =  1,  ....  k,  let 


{Y.,,  ...,Y.  }  be  n,  independent  observations  of  a 

ll  ini  1 

which  has  cumulative  distribution  function 


random  variable  Y . 

i 


F  (y)  =  Pr(Yt  <.  y) 

and  quantile  function 

(u)  =  Fj  L(u)  =  inf  {y  :  (y)  >  u),  0  <■  u  <_  1 

Associated  with  Y.  is  a  numerical  characteristic,  X.,  of  the  ith 
i  i 

population  and  we  assume  for  convenience  that  X^  £  . . .  £  X^ 

Thus  X. ,  . . . ,  X  would  be  the  various  years  in  service  or  water 

1  K 

temperatures  in  the  examples  cited  above. 

The  k  sample  quantile  regression  problem  is  to  find  estima¬ 
tors  of  and  make  inferences  about  A(u)  and  B(u)  in  the  k-sample 
regression  model 

Q^u)  »  A (u)  +  B(u)  ,  i  =  1,  ...,  k  . 


The  purpose  of  this  dissertation  is  to  investigate  a  method  for  deter¬ 
mining  such  estimators  based  on  the  approach  to  quantile  functions 
presented  by  Parzen  (1979a) . 

We  assume  a  location-scale  shift  model  for  Q^,  i.e.  that 
Qf,  can  be  written  as  a  location-scale  shift  of  some  common 

quantile  function.  We  write 

Qi (u)  =  b1  +  o±  Q0(u)  ,  i  =  1,  ....  k  , 

where  and  are  the  location  and  scale  parameters  respectively 
of  and  where  either  the  form  of  Q Q  is  unknown  but  does  not  depend 

v 

on  any  unknown  parameters  or  Q  ^(u)  =  [Qq*(u)J  where  Q  q*  ( • )  is  a 
known,  completely  specified  quantile  function  and  y  is  an  unknown  shape 
parameter.  For  example,  we  may  believe  that  corresponds  to  either  a 
standard  normal  so  that 

Qq(u)  =  'J>”1(u)  , 

where 

*(y)  =  /y  (1//27  )  exp(-t2/2)dt 

—  ro 

or  a  standard  lognormal  distribution  so  that 

Q  (u)  =  exp  (<t  L(u)) 
o 

On  the  other  hand  we  may  believe  that  Qn  corresponds  to  a  three-para¬ 
meter  Weibull  distribution  so  that 

Q0(t.)  =  (-logd  -  u))Y 
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where  the  shape  parameter  y  needs  to  be  estimated. 

We  further  assume  that  and  are  linearly  related  to  , 

i.e. 


P,  =  a  +8  X, 
i  U  U  i 


o  ^  ~  ^0  X^  f  i  ~  1*  *•&,  b.  • 


Thus,  we  can  write  the  quantile  regression  model 


Q.  (u)  =  [a  +  8  X,  ]  +  [a  +  8  X.]Q  (u) 
i  pul  a  010 

=  (a  +  a  Q  (u)3  +  [8  +  8  0  (u)]X.  . 

u  00  U  00  i 

The  aim  of  this  dissertation  is  to  investigate 

1)  methods  for  identifying  the  shape  of  Qq,  i.e.  either  choose 
a  completely  specified  function  from  possible  contenders  or 
estimate  y, 

2)  methods  for  determining  estimators  a  ,  8  ,  a  ,  8  ,  of  a  , 

U  JJ  C7  C7  p 

V  V  V  arr1 

3)  the  properties  of  estimating  A(u)  and  B(u)  by 

A(u)  =  +  a0Qo(u) 

B(u)  =  Bu  +  P0Qo(u)  . 

Section  2  presents  basic  definitions  and  theorems  regarding  the 
quantile  function  and  the  empirical  quantile  function.  The  Weibull 
distribution  and  its  properties  are  also  discussed. 
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In  Section  3,  Parzen's  (1979a)  nonparnmetric  data  modeling 
method  of  determining  Q  for  one  population  is  described  and  exten¬ 
ded  to  determining  a  common  Q  for  k  populations.  An  estimator  of 

o 

the  shape  parameter  y  is  proposed  and  its  properties  are  investigated. 

In  Section  4  we  discuss  two  formulations  for  estimating  location 
and  scale  parameters  using  linear  combinations  of  order  statistics. 

The  approaches  are  due  to  Ogawa  (1951)  and  Eubank  (1979). 

In  Section  5  we  develop  new  methods  for  k-sample  quantile 
regression  using  the  models  discussed  above.  Hypothesis  testing  pro¬ 
cedures  are  provided.  The  application  of  the  technique  to  a  particu¬ 
lar  type  of  location-scale  comparison  problem  is  also  discussed. 

Finally  in  Section  6  the  techniques  of  Sections  3  through  5 
are  applied  to  the  analysis  of  the  Hogg  data  and  the  Matis  and  Wehrlv 
data. 

Section  7  consists  of  conclusions  and  suggested  topics  for 
future  research. 
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2.  QUANTILE  FUNCTIONS  AND  THE  WEIBULL  DISTRIBUTION 

In  Section  ?„1  we  introduce  the  quantile  function  notation  of 
Parzen  (1979a)  and  state  some  useful  theorems  and  properties  of  the 
quantile  function.  In  Section  2.2  we  define  the  Weibull  and  extreme 
value  distributions  and  provide  plots  of  the  Weibull  quantile  and 
density  quantile  functions  for  a  range  of  values  of  its  shape  para¬ 
meter.  Lower  hounds  on  the  variance  for  unbiased  estimators  of  the 
parameters  of  the  Weibull  distribution  are  given, 

2.1  Definitions  and  Notation  of  the  Quantile  Function  Approach 

We  adopt  the  quantile  function  notation  of  Parzen  (1979a). 

Some  useful  definitions  are: 

1.  The  cumulative  distribution  function  (cdf)  of  a  random 
variable  X  is  defined  by  F(x)  =  Pr(X  <_  x). 

2.  The  quantile  function  of  X,  Q(u),  is  defined  by 

Q(u)  =  F-1(u)  =  inf {x:  F(x)  >  u),  0  £  u  <_  1  . 

3.  The  probability  density  function  of  a  continuous  random 
variable  X  is  defined  by 

f(x)  =  d  F (x)  /  dx 

so  that 

x 

F (x)  =  /  f (t )dt  „ 


—  OO 


4„  The  quantile  density  function,  q(u),  is  defined  by 


q(u)  =  dQ(u)  /  du  ,  0  <  u  <  1 


5.  The  density  quantile  function,  fQ(u),  is  defined  by 


fQ(u)  =  f(Q(u))  ,  0  <  u  <  1  . 


The  sample  analogs  of  the  above  quantities  are  presented  in 

the  following  definitions*  Let  X  <  <  X  be  the  order 

l;n  —  —  n;n 

statistics  of  a  random  sample  of  size  n  from  a  population  with  cdf  F. 


6.  The  empirical  distribution  function  (edf),  F(x),  is  given 


F (x)  =0  if  x  <  X, 


j/n  if  X.  <  x  •  X  , 

j;n  -  j+l;n 


where 


j  1 ,  •('n,  n— 1 


=  1  if  X  <  x 
n;n  — 


F (x)  =  1/n  T.  6  (x)  , 

j=l  j 


5  (x)  =  1  if  X  ^  x 

A 


0  otherwise 


7.  The  empirical  quantile  function,  Q(u),  is  defined  by 


Q(u)  =  F_1(u) 

“  X  ,  (j-1)  /n  <  u  £  j/n  , 

J  *  " 

j  =  1 . n  . 

While  this  is  a  natural  definition  of  Q(u),  two  other  continuous 
definitions  discussed  by  Parzen  (1979a)  are  useful  in  both  theoret¬ 
ical  and  applied  problems.  The  piecewise  linear  version  of  Q(u)  is 
defined  by 

Ql(u)  =  nlj/n-u]X  1;n  +  n[u  -  (j-l)/n]  XJ;n  , 
(j-l)/n  £  u  £  j/n  , 

j  =  1,  ....  n  ,  (2.1.1) 

where  X„  is  a  natural  minimum,  i.e.  a  lower  bound  on  the 
0;  n 

range  of  the  data,  if  one  exists,  and  X0;n  ■  X1}n  otherwise. 

The  shifted  piecewise  linear  version  of  Q(u)  is  defined  by 

Qg(u)  =  n[ ( j  +  „5)/n  -  u]X^,.n  +  n(u  ~  (J  “  .5)/nJXj+ijn 

(j  -  „5)/n  £  u  £  (j  +  .5)/n, 
j  =  1,  ....  n  -  1  . 

We  leave  Qg(u)  undefined  for  u  <  .5/n  or  u  >  1  -  ,5/n 


[ 
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8.  If  we  use  Qg(u)  then  we  can  define  die  empirical  quantile 
density,  q(u),  as 

q(u)  =  dQg  (u) / du 

=  n(xi+i.„  "  X.  ),  (j-.5)/n  <  u  <  (j  +  .5)/n, 
j  -  1,  ....  n  -  1  ,  .  (2.1.2) 

and  the  empirical  density  quantile  function,  f()(r),  as 

f  Q  (u)  *  1  /  q  (u)  . 

Two  useful  properties  of  the  quantile  function  are  given  in 
Theorems  2„1„1  and  2.1.2. 

Theorem  2.1.1:  Let  F(")  be  a  strictly  increasing  cdf  and  let  g(')  be 

a  strictly  increasing  continuous  function.  If  Y  »  g(X),  then 

Qy(u)  **  g(Q  (u)).  If  g ( ° )  is  strictly  decreasing,  then 
X 

Qy(u)  =  g(QxU  -  «))  • 

Thus,  if  Y  »  u  +  oX  ,  then  Qy(u)  =  u  +  oQ  (u) ,  and  if  Y  «  log(X), 
then  Qy(u)  =  log(Qx(u))„  This  property  of  the  quantile  function  pro¬ 
vides  a  natural  representation  for  parameter  estimation  since  it 
allows  one  to  formulate  the  estimation  of  location  and  scale  para¬ 
meters  as  the  estimation  of  parameters  in  the  simple  linear  regress¬ 
ion  of  Qy  on  if  is  a  simple  known  function. 

Theorem  2.1.2;  Let  f<)(")  and  q ( • )  be  the  density  quantile  and  quan¬ 
tile  density  functions  corresponding  to  ()(•).  Then  K)(u)  =  l/q(u). 


k 
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Definitions  and  useful  theorems  regarding  the  asymptotic  dis¬ 
tribution  of  Q(u)  follow* 

9.  A  Brownian  Bridge  process  (B(u),  0  <_  u  1}  is  a  zero  mean 
normal  process  with  covariance  kernel 

Kg(u1,  u2)  =  CovCBCu^,  B(u2>)  =  min(u1,u2)  -  u^  . 

Theorem  2*1.3:  Under  suitable  conditions  (see  Csorgo  and  Revesz  1978) 

Jn  fQ(u)  (Q(u)  -  Q(u) )  ^  B(u),  for  all  u  , 

where  the  symbol  ^  denotes  "converges  in  distribution  to". 

A  special  case  of  this  convergence  theorem  is  the  following: 

Theorem  2.1*4:  Let  F  be  an  absolutely  continuous  cdf  with  pdf  f  and 
let  0  <  u^  <  . . .<  u^  <  1.  If  fQ  is  differentiable  in  a  neighborhood 
of  u^  and  fQ(u^)  4  0,  for  all  i,  then 

(Q  -  Q)  £  Nr(0r,  C) 

Q  =  (Q(u1>,  ....  Q(ur))T, 

Q  =  (QC^),  Q(ur))T, 
or=  (0,  ....  0)T, 


where 
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and 


where 


c  ‘  (cij> 


u  (1  -  U  ,  ) 

Cij  =  Cji  =  fQ(u.)fQ(u.)  ’  1  1  1  1  J  1  r 


r  "  j 

2.2  The  Weibull  and  Extreme  Value  Distributions 


(2.1.3) 


In  this  dissertation  the  basic  model  for  Q(u)  is  to  assume 


Q(u)  =  i)  +  o  Q  (u) 
o 

where  Q^(u)  is  a  completely  specified  quantile  function  except  for  a 
possibly  unknown  shape  parameter  and  p  and  o  are  the  location  and 
scale  parameters  of  Q.  Two  quantile  functions  that  have  proven  to  be 
particularly  useful  in  a  variety  of  statistical  problems  are  those  of 
the  three-parameter  Weibull  distribution  and  the  extreme  value  distri¬ 
bution.  The  Weibull  and  extreme  value  distributions  have  been  used 
as  models  in  reliability,  survival  studies,  quality  control,  hydrology, 
etc.  (see  Dubey  1967;  Hassanein  1971;  Johnson  and  Kotz  1970). 


Def init ion:  A  continuous  random  variable  Y  is  said  to  have  the  three- 
parameter  Weibull  distribution  with  parameters  u,  o,  and  c  if 


F(y)  =0  if  y  <  |i 

=  1  -  exp{-(  (y  -  u)/ojr)  if  y  >  |i  (2.2.1) 


where  « ,  c  are  greater  than  zero 


The  parameters  p,  a,  and  c(=  1/y)  are  the  location,  scale,  and  shape 
parameters, respectively.  For  a  random  variable  following  a  three- 
parameter  Weibull  distribution  we  have 

Q(u)  =  P  +  a [-log (1-u) ]  Y,  y  *  —  ,  0<u<l  , 

c  — 

Qo(u)  =  [-log(l-u)]Y, 
f  (y)  =  c  yL  1exp(-yC)  , 

and 

foQo(u)  =(l/y) (1-u) [-log(l-u) ]1_Y  . 

By  varying  the  shape  parameter  y,  one  can  fit  a  wide  range  of 

unimodal  distributional  shapes  from  skewed  right  to  almost  symmetric 

to  skewed  left.  The  role  ot  p  is  as  a  threshhold  value  (or  starting 

value),  i.e.  Q(0)  =  p,  rather  than  as  a  measure  of  central  tendency. 

Figures  2. A  to  2.J  display  the  Weibull  Q  and  f  Q  functions  for 

o  o  o 

Y  =  .1(.1)1. 

By  letting  c  =  l(or  y  =  1)  in  2.2.1  we  obtain  the  exponential 
distribution.  For  c  <  3(or  y  >  .333)  the  distribution  is  skewed 
right.  When  3  <  c  <  4  (or  .23  <  y  <  .333)  the  distribution  looks 
more  symmetric.  For  c  =  3.6  the  Weibull  density  is  similar  to  *-hat 
of  the  normal  giving  =  ,0006  and  b^  =  2.7167  (KUbler  1979) 
where  *^b^  is  the  skewness  measure  and  measures  kurtosis  (see  Rao 
1973,  p.  101).  For  c  >  4(or  y  <  .25)  the  distribution  is  skewed  left. 


I 


Figure 
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Definition;  If  the  continuous  random  variable  Y  lias  the  three  - 
parameter  Weibull  distribution,  then  X  =  -log(Y-p)  is  said  to  have 
the  extreme  value  distribution. 


Since  X  =  -log(Y  -  p)  we  have  by  Theorem  2.2.1  that 

Q..(u)  =  -log  Q  (1  -  u)  and 
•'  i  ~u 

Fly)  =  exp (-exp{ - [  (y  -  >i  ' ) /o  '  J  >) <  >•<•", 

Q(u)  =  u'  +  o’{-log[-log(u) ] },  0  <_  u  <_  1  , 
f  (y)  =  exp{-[y  +  exp(-y)]}  , 

and 

f^Q^Cu)  =  -u  log  u 

where  p '  =  log  o  and  o  1  =  >  ,  and  o  and  >  are  the  Weibull  scale  and 
shape  parameters,  respectively. 

The  Cramer  Rao  Lower  Bound  (CRLB,  see  Rao  1973,  pp.  324-331) 
for  the  variance  of  unbiased  estimators  0  =  (w,  o,  c  )  of 
0  =  (p,  o,  c)^  is  given  by  Kiibler  (1979)  as: 

Var(O)  21  1/n  I_1(0) 

where  for  matrices  A  and  B  the  notation  A  >_  B  means  that  A  -  B  is 
positive  semidefinite  and  I(t')  =  (I  (l)))is  the  Fisher  information 
matrix  (see  Rao  197),  p3Sl)  of  u ,  o,  c  and  is  given  by 


and 


where 


and 


/C-l 


=  (~ — )  r(h?)  provided  c  >  2  , 


a  '  '2 

X22(®>  =  (I)'  * 


I33(0)  =  He 


-2 


~  I21^9^  ~  r(hj)  provided  c  >  1  , 


“  I31^6^  =  "  r(fl1)H2  Provided  c  >  1 


i2j<2>  ‘  hz^J  -  -  ^ 


Hl  =  i|i'(l)  +  <p2(2)  ±  1.82368066, 


Hn  =  ^(h  )  +  1  , 


hj  =  1  -  j/c  ,  j  =  1,  2, 


X“*l  —  ^ 

T(x)  =  ft  e  dt  is  the  Gamma  function, 
0 


Mx)  =  d  log  r(x)/dx  =  r’(x)/r(x)  , 


<P’(x)  =  d  ii/(x)/dx 
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3.  IDENTIFICATION  OF  DISTRIBUTIONAL  SHAPE 

The  identification  of  the  shape  of  an  unknown  distribution 
is  the  first  stage  of  analysis  as  we  perceive  the  k  sample  quantile 
regression  problem.  In  this  section  we  describe  three  quantile  func¬ 
tion  approaches  to  identifying  a  distributional  shape. 

In  3.1  we  discuss  quantile-box  plo.ts  and  present  the  nonpara- 
metric  data  modeling  and  goodness-of-f it  procedures  for  one  popular 

tion  developed  by  Parzen  (1979a)  to  determine  Q  .  In  3.2  we  discuss 

o 

a  parametric  approach  to  estimating  the  shape  parameter  y  in  the 
model : 

Q,(u)  -  M.  +  a . (Q  *(u))Y  ,  i  =  1,  ....  k 
i  l  l  o  * 

where  Q0*(")  is  a  completely  specified  quantile  function.  The  pro¬ 
cedure  is  a  generalization  of  one  proposed  by  Dubey  (1967)  for 
estimating  the  shape  parameter  of  the  Weibull  distribution.  In 
3.3  a  procedure  for  either  determining  Qq  for  k  populations  or 
for  estimating  y  is  discussed.  The  procedure  is  based  on  the 
goodness-of-fit  procedures  of  Parzen. 

3.1  Determination  of  Q0 

In  this  section  we  describe  the  quantile-box  plot  approach 

(Parzen  1979a)  to  represent  data  and  compare  k  samples  of  data. 

We  discuss  how  quantile-box  plots  can  assist  one  in  determining 

Q  for  the  model 
o 
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Q^u)  =  l't  +  Q(  (11)  ,  i  =  1 ,  . . . ,  k  . 

The  nonparametric  data  modeling  techniques  and  goodness  of  fit 
procedures  for  one  population  developed  by  Parzen  (1979a)  are 
also  discussed. 

Quantile-box  plots  are  described  by  Parzen  (1979a)  as  a 
"quick  and  dirty"  approach  to  exploratory  data  analysis.  The 
technique  is  a  variation  of  the  box  and  whiskers  technique  intro¬ 
duced  by  Tukey  (1977).  Quantile-box  plots  assist  one  in  determining 
the  qualitative  characteristics  of  Q(u),  e.g.  skewness,  symmetry, 
modality,  and  tail  behavior.  However  the  study  of  quantile-box 
plots  is  an  imprecise  science;  much  of  the  interpretative  value 
of  the  plots,  especially  for  small  sample  sizes,  depends  on  the 
predilections  of  the  investigator, 

A  quantile-box  plot  of  a  sample  of  data  consists  of  a  graph 
of  Q(u)  (we  use  Qg(u))  as  a  function  on  the  unit  interval  0  <_  u  £  1) 
on  which  a  series  of  boxes  is  superimposed.  The  boxes  have  as 
vertices  (p,  Q(p)),(p,Qd  -  p)),  (1  -  P.  Qd  -  p)),  and  (1  -  p,  Q(p)). 
One  usually  chooses  p  =  .25,  .125,  and  .0625  .  Within  the  H  box 
(p  =  .25)  one  can  draw  a  horizontal  median  line  through  Q(.5). 

Parzen  (1979b, p.  243)  gives  an  approximate  95  7.  confidence  interval 

Q(.5)  +  (2/ /n) [Q( . 75)  -  Q(.25)]  . 


for  the  median: 
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One  can  use  the  quantile-box  plot  technique  to  classify  the 
distribution  of  data  as  normal  shaped,  skewed  right  vs.  skewed  left, 
or  long-tailed  vs.  short-tailed.  One  can  detect  modes  as  flat  spots 
in  Q(u)  and  the  presence  of  two  groups  as  jumps  in  Q(u).  Intervals 
of  sharp  rise  outside  the  D  box  (p  =  .0625)  cause  one  to  suspect  the 
presence  of  outliers  or  a  long-tailed  distribution.  Skewness  and 
symmetry  can  be  checked  by  inspecting  the  shape  of  Q(u)  within  the 
boxes  and  also  by  examining  the  position  of  the  H  box  within  the  E 
box  (p  =  .125)  and  the  E  box  within  the  D  box. 

One  can  use  multiple  quantile-box  plots  to  check  if  k  samples 
of  data  have  homogeneous  shapes  except  for  a  location-scale  shift. 
Figure  6.C  (p.  87)  shows  the  quantile-box  plots  for  four  samples  of 
the  Hogg  (1975)  professors'  salary  data.  Comments  on  the  plots  are 
given  in  Section  6.1. 

Parzen's  approach  to  determining  Q  .  For  a  random  sample 

{X, ,  ....  X  }  of  a  continuous  random  variable  X  with  cdf  F(x) 

1  n 

and  quantile  function  Q(u),  one  tiypothesizes  a  location-scale  model 

H0 :  Q(u)  =  y  +  a  (u)  .  (3.1.1) 

Parzen  (1979a)  discusses  procedures  which  provide  a  test  of  H0  and 

also  yield  estimators  of  the  true  f Q  function  when  Hq  is  rejected. 

The  situation  of  interest  is  when  Q  is  unknown  and  one  would  like 

<> 

to  test  II  for  various  specifications  of  Q0  (e.g.  normal  vs. 
logistic  vs.  Cauchy). 
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Parzen  defines  the  following  quantities: 


1)  the  transformation  density,  d(u),  defined  by 


d(u)  =  (l/o  )  f  Q  (u)q(u)  ,  0  <  u  <  1  , 
o  o  o  —  — 


where 


a  =  /  f  Q  (u)q (u) du  , 
o  0  o  o 


f  Q  (u)  =  f  (Q  (u)  ), 
o  o  o  o 


q(u)  =  dQ(u)/du  ; 


2)  the  transformation  distribution,  D(u),  defined  by 


D(u)  =  fQ  d (t)dt  ; 


3)  the  complex-valued  transformation  correlations,  P (v) , 
v  =  0,  +  1,  +  2,  ....  defined  by 


p(v)  =  /g  exp(2  Ttiuv)d(u)du 


One  can  estimate  the  above  quantities  using: 


A)  d(u)  »  (1/  °0 ) f  ^ (u)q (u)  ,  0<  u:l  ,  where 


o  =  /.,  f  Q  (u)q(u)du  , 
l)  ()  o  o 


q(u)  is  defined  by  2.1.2  ; 


5)  D(u)  =  rQ  d(t)dt  ; 


» 


2b 


6)  p  (v)  =  /p  exp(2  it  iuv)  d(u)du,  v  =  0,  +  1>  ... 

One  can  obtain  smoothed  estimators  of  the  above  quantities  using 
autoregressive  methods  : 

7)  d  (u)  =  K  I  g  (exp (2  it  iu))j  2  , 

m  m  1  m 

where 

g  (z)  =  1+a  (l)z+  ...  +  a  (m)zm  , 
m  m  in 

a  (1),  ...,  a  (m)  satisfy  the  normal  equations 
m  ra 

p(-v)  +  a  (l)p(l  -  v)  +  ...  +  a  (m)p(m  -  v)  =  0  , 
m  in 

v  =  1,  .  . .  ,  m  , 


K  =  1  +  a  (l)p(l)  +  ...  +  a  (m)p(m)  , 

mm  m 

and  m  is  the  order  of  the  autoregressive  smoothing; 

8)  D  (u)  -  /“  d  (t)dt  ; 
m  0  m 


|gm(exp(2  n  iu))|2  f0Q0(u) 

/g  I  f?m(exp  (2  it  i  u)  )  |  ?'fo0o  (u)q  (u)du 


Parzen  proposes  minimizing  the  CAT  criterion  defined  by 


CAT (m)  =  1/n  E 

j=1 
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to  determine  an  "optimal"  order  m  of  autoregressive  smoothing.  One 
can  also  select  an  appropriate  order  of  smoothing  by  visually  check¬ 
ing  how  well  D  (u)  fits  D(u) . 
m 

The  hypothesis  H  :  Q(u)  =  p  +  oQ  (u)  is  equivalent  to  any  of 

O  G 

the  following  hypotheses: 

1)  d(u)  =  1  , 

2)  D(u)  =  u  , 

3)  p  (v)  =  0  for  v  4  0  . 

The  following  test  statistics  could  be  used  to  test  Ho  : 

1)  max  d (u)  or  log  d(u)du,  0  <_  u  <_  1 

2)  max  | D(u)  -  u|,  0  £  u  £  1 

3)  |p  (v)  |  2  ,  v  ^  0  . 

Parzen  (1979a)  provides  references  for  the  properties  of  these 
statistics.  When  CAT  selects  m  =  0,  Parzen  regards  it  as  confir¬ 
mation  of  H  . 

o 

A  useful  diagnostic  discussed  by  Parzen  (1980)  is  the  p 

mode  or  mode  percentile.  It  is  defined  to  be  the  value  of  u  at 

which  fQ(u)  achieves  its  mode  (or  maximum  value)  when  fQ(-)  is 
unimodal.  When  the  p  mode  exceeds  .5  the  distribution  is  skewed 
left  and  when  the  p  mode  is  less  than  .5  the  distribution  is 
skewed  right. 

The  function  ^  (u)  is  a  useful  estimator  of  fQ(u)  even  when 
m 

one  has  sufficient  evidence  to  reject  H  .  By  examining  the  interval 
of  u  values  for  which  D  (u)  (or  D(u))  is  approximately  linear  in  u, 


L 
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one  can  detect  which  parts  of  the  data  seem  to  fit  the  hypothesised 

Q  function. 
x  e 

The  computer  package  ONESAM  (Parzen  and  White  1979)  provides 
plots  of  Q(u) ,  q(u),  and  fQ(u) .  By  specifying  any  of  several  famil¬ 
iar  Q  functions,  plots  of  d(u),  D(u),|p  (v)l^,  D  (u) ,  and  fQ  (u) 
o  mm 

(for  several  orders  ra)  are  produced  along  with  the  goodness-of-f it 
diagnostics  discussed  above. 

3.2  Estimation  of  the  Shape  Parameter 

Motivated  by  the  fact  that  Q(u)  is  of  the  form 

Q (u)  =  p  +  a  (Qq*(u))Y  (3.2.1) 

for  X  having  the  three  parameter  Weibull  distribution  (with 
Q(*(u)=  -log(l-u))  and  the  three  parameter  lognormal  distribution 
(with  Qo*(u)  =  exp[$  1 ( u) ] ,  see  Johnson  and  Kotz  1970,  p.  112),  we 
investigate  the  estimation  of  the  shape  parameter  in  (3.2.1). 

We  first  find  an  estimator  y  of  y  for  the  one  sample  case  and 
then  show  how  to  pool  estimators  y^,  ....  y^  obtained  from  samples 
from  k  populations,  the  ith  of  which  has  quantile  function 

Q . (u)  =  v  +  o  (Q  *(u))\  (3.2.2) 

i  i  i  o 

to  produce  an  estimator  of  y  . 

Theorem  3.2.1:  Lot  0  <  Uj  <  u^  <  <  1  be  values  satisfying 

Q  *(u  )  =  (q  *(u  )q  Mu,))*. 

O  ^  O  1  o  i 
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Then 

Y  =  ■ 

Proof: 

For  u 

Then 


Since  Q  *(u 


and 

log ( [Q(u 
Hence 

Y  * 


log{[Q(u3)  -  Q(u2)J/[Q(u2)  -  Q(u^) ] }  (3.2.3) 

log  [Qo*(u3)/Qo*(u1)J 


<  u2  <  u3  we  have 

Q(ix  )  =  M  +  a  (Qo*(uj  ))  j  =  1,  2,  3  . 

Q(u  )  -  Q(u  )  (Q  *(uJ)Y  -  (Q  *(u  ))Y 

J  _  L  =  o  J  o  2 

Q(u2)  -  Q(ux)  (QQ*(v2))y  -  (Qo*(u1))Y 

r 

)  =  fq  *(u  )q  ,  then 

o  i  o  j 

q(u^)  -  q(u  )  [q  *(uJ  (2 

q(u_)  -  q(u  )  q  *(u  ) 

£  l  [_0  I  _ 

)  -  q(u  )]/(q(u  )  -  Q(u  ) I )  =  Y  log[q  *(u_)/q  *(u.)] 
1  £  £  i  ~  0J0I 


log{[q(n3)  -  0  ( ti  2 )  ]  /  [Qfi.  2>  -  Q(u  )]1 
log[qo*(u3)/Qo*(Ui) J 
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Theorem  3.2.2;  Let 

?  logl  [ Q ( u 3 ) _Q ( »i 2 >  1  / 1 Q (1J2 ) — Q (41  1 J  (3.2.4) 

log  [Q  *(u1)/Q  *(u  ) ) 
o  J  o  1 

Then  /rT(>  -  y)  d  N(0,  V  (y ) ) 
where 


v(>)  -  2  to  nd2 1  +  °22(d21  +  d32)  +  °33d32 

Cl 

■2o 12  (d2 1  +  d21d32)  +  2ol3d21d32  ~  2°23fd32  +  d21d32)]’  (3’2,5) 


d  =  log[Q  *(u  )/Q  *(u  )]  , 

O  J  O  1 


d  =  1/((Qq*(u1))Y  -  (Qo*(Uj))Y]  , 


o  =  min(V  Ui)  "  Vj 

f  Q  *(u.)f  Q  *(u.) 
o  o  i  o  o  j 


where  f  Qq*  is  the  fQ  function  corresponding  to  Qq*  . 

Proof :  By  Theorem  2.1.4,  we  have 

2n(Q(Uj)  -  Q(iij),  Q(u,,)  -  Q(u?),  Q(u^)  -  Q(u^))T  ^  1^(0^,  C) 

where 


Lj.  =  =  i»  f  ( 1  -  u  )/(fQ(u1 )  fQ(u  j) )  ,  1<I  <  j  <  3  . 

Then  since  y  =  fi(Q(Uj),  Q(u2),  Q(u^)>  and  y  =  g(Q(u^) ,  Q(u^) ,  Q(u^)) 
where  r,(:;^,  x2>  x^)  Is  defined  by  (3,2.3)  and  (3.2.4),  we  have  (see 


Kao  1973,  p.  387) 
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/T(y  -  y )  ►  N(0,  V (y ) ) 

where 

V(y)  =  tTC  t  , 

t  =  (  Dg/3x1,  3g/3x2,  3g/3x3)T  . 

Since 

3  g/  3Xj  =  2/[d(Q(u?)  -  Q(u^))]  ■, 

3g/3x2  =  -2/d(i/(Q(u3)  -  Q(u2))  +  1/(Q(u2)  -  QCu^)]  , 

3 g/  3 =  2/ [d (Q (u^)  -  Q(u2>)] 
the  theorem  follows. 

Remarks  on  Theorem  3.2.2: 

1)  The  estimator  y  and  its  asymptotic  distribution  is  inde¬ 
pendent  of  n  and  a  . 

2)  Theoretically  one  can  choose  optimal  values  of  u^,  u2>  u^ 

which  minimize  the  variance  of  y  .  The  values  will  be  a 

function  of  y  for  a  given  Qq* .  Table  3.1  gives  optimal 

values  of  u^,  u2,  u^  which  minimize  V(y)  for  y  =  .05,  ,1(.1)1 

2,  3  when  Q  *(u)  =  -log(l  -  u)  (i.e.  the  Weibull  distri- 
o 

button).  The  table  also  gives  the  minimum  value  of  V(y) 
and  compares  it  to  the  CRLB  for  unbiased  estimators  of  y  . 
Figure  3. A  plots  the  optimal  values  of  u^,  u2 ,  u^  as  a 
function  of  y.  See  page  32  for  further  discussion  of 
Table  3.1. 


3)  Since  V(y)  is  a  continuous  function  of  y,  a  consistent  esti- 


mator  V(y)  of  V(y)  is  obtained  by  substituting  y  for  y  in  (3. 

Dubey  (1967)  gives  the  formula  for  an  estimator  of  the  shape 
parameter  of  the  three  parameter  Weibull  distribution  when  o  and  p 
are  unknown.  The  estimator  of  1/y  =  c  is  given  by 


'  v.-  -  log[-log( l-u  ) ]  -  log[-log(l  -  u  )] 

(1/Y)  =  c  =  - r - --- - r - — - 

2  |  log(t)(n.})  -  Q(u,)>  -  log(Q  ( u  2 )  -  Q(Uj))] 


where 

» 

u2  =  1  -{exp  —  l  1  og  ( 1  -  u^)  log(l  -  li3)  ] 2  >  , 

which  is  just  the  reciprocal  of  (3.2.4)  using  Qq*(u)  =  -log(l  -  u) 
Dubey  states  that  the  variance  of  c  depends  on  the  true  value  of  c 
and  consequently  he  does  not  utilize  optimal  values  of  u^  and  u^ 
which  minimize  the  variance  of  c. 

When  one  lias  samples  from  k  populations  which  satisfy  the 
model  (3.2.2),  we  now  show  a  method  to  test  for  homogeneity  of 
shape  and  to  estimate  the  common  value  of  y.  Let  Q^(u)  be  the 
empirical  quantile  function  based  on  a  sample  of  size  n^  from 
population  i.  To  combine  estimators  y  ^ . y^  of  y  we  have 


Theorem  3 .2. 3 : 

k 

Let  II  =  V. 

i  =  l 


V(>) 


and 
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y  =  *  Vl/n  ’ 
v  i=l 


(3.2.6) 


where 


21og([Qi(u3)  -  Q.  (u2)  )  /  [(}  L  (u  .,)  - 


log[Qo*(ii3)/Qo*(u1)  ] 


n  a  ;;  n  , 
1  =  1 


Q0*(u2)  =  (Qo*(i«1)  Q0*<u3>)  ,  u1  <  u2  <  u3  , 


and  V (y )  Is  given  by  (3.2.5)  . 

If  the  k  populations  do  in  fact  have  the  same  shape  parameter  y  » 


»  11  1  • 


2)  /n(yp  -  y)  t  N(0,  V  (y )  ) 


where  as  n*«>,  the  ratio  n  /n  approaches  a  constant. 


Proof:  This  is  a  direct  application  of  Rao  (1973,  p.  389), 


Remarks  on  Theorem  3.2.3: 


l)  The  statistic  11  can  be  used  to  test  for  homogeneity 
of  shape. 

.’)  ,  p  is  a  pooled  estimator  of  the  common  shape  parameter 

of  the  's. 
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3)  A  consistent  estimator  of  V(y)  is  obtained  by  substituting 

Y  for  y  1°  (3.2.5). 

P 

4)  Optimal  values  of  u^,  u^ ,  and  u^  which  minimize  the  var¬ 
iance  of  Y  will  depend  on  the  true  value  of  Y-  Table 

P 

3.1  can  be  used  to  find  the  optimal  values  of  u^ ,  u^  ,  u^ 
for  a  range  of  values  of  y  and  Q^*(u)  =  -log(l  -  u) . 
Information  obtained  from  quantile-box  plots  or  histori¬ 
cal  data  regarding  the  distributional  shape  may  help  to 
determine  an  appropriate  set  of  values  of  u^,  u2»  u^. 
Remarks  on  Table  3.1  and  Figure  3. A: 

Table  3.1  gives  the  optimal  values  of  u^ ,  u2 ,  u^  for  the 
estimator  y  assuming  Qq*  for  the  Weibull  distribution,  V(y)  using 
these  u  values,  the  CRLB  for  y  when  appropriate,  and  the  asymptotic 
relative  efficiency  (ARE)  of  y  defined  by  ARE(y)  =  CRLB/V(y). 

Figure  3. A  plots  the  optimal  values  of  u^,  u^,  u^  as  a  function 
of  y.  The  following  trends  are  evident. 

1)  For  y  >  1*0  (i.e.  the  distribution  has  no  mode  and  is 
highly  skewed  right),  then  u^  =  1.0.  The  optimal  value 
of  u^  goes  from  .63  (for  y=3.0)  to  .01  (for  y  =  1.0). 

2)  For  y  =  1.0  (i.e.  the  exponential  distribution),  u^  =  .98 
and  Uj  =  .01  are  optimal. 

3)  As  Y  goes  from  1.0  to  .3  (i.e.  the  distribution  is  uni- 
modal  and  goes  from  skewed  right  to  almost  symmetric). 


i 


1 
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Table 

3.1  Optimal 

Values  of 

>^2^3 

for 

y;  Weibull  Distribution 

Y 

U1 

U2 

U3 

v(y) 

CRLB 

are(y) 

3 

.6283 

.9452 

.9998 

.0341 

NA 

-- 

2 

.2764 

.  7750 

.9990 

.4948 

NA 

-- 

1 

.0064 

.1461 

.9795 

i 

.0326 

.5483 

.531 

.9 

.0018 

.0694 

.  9690 

.8267 

.4442 

.538 

.8 

.0002 

.0224 

.9225 

.5799 

.3509 

.605 

.7 

.00017 

.0177 

.8473 

.3356 

.2637 

.8007  ' 

.6 

.00017 

.0162 

.7902 

.2059 

.1074 

.9537 

.5 

. 0001 7 

.0145 

.7141 

.  1481 

.1371 

.9257 

.<1 

.00017 

.0131 

.6380 

.  1266 

.037  7 

.6982 

.3 

.00017 

.0115 

.5429 

.1211 

.0494 

.4079 

.2 

.00010 

.0067 

.6274 

.427 

.0218 

.1943 

.  1 

.0001 

.0057 

.6274 

.4645 

.0055 

.01  18 
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the  optimal  value  of  u^  goes  from  .98  (for  y  =  1.0)  to 
.54  (for  y  =  .3).  The  optimal  value  of  u^  remains  close 
to  0. 

4)  When  y  =  .3  (i.e.  the  distribution  is  almost  symmetric, 
normal  shaped),  u^  =  .002  and  u^  =  .54  are  optimal. 

5)  As  y  goes  from  .3  to  .05  (i.e.  the  distribution  goes  from 
almost  symmetric  to  skewed  left),  the  optimal  value  of  u^ 
goes  from  .54  (for  y  =  .3)  to  .52  (for  y  =  .05)  and  the 
optimal  value  of  remains  close  to  0. 

6)  The  ARE  increases  from  .53  for  y  =  1.0  to  .96  for  y  =  .6. 

The  ARE  is  .41  when  y  =  .3  and  decreases  rapidly  to  .01 
for  y  *  .05. 

7)  The  CRLB  for  y  is  inappropriate  for  y  >  1.0  since  the 
Fisher  information  measure  for  y  >  .1.0  does  not  exist. 

One  might  wish  to  compare  V(y)  to  the  asymptotic  variance 
of  the  maximum  likelihood  estimator  of  y  based  on  a  cen¬ 
sored  sample  (see  Harter  and  Moore  1967)  , 

Thus  a  strategy  emerges.  If  one  assumes  the  data  to  have  a 
Weibull  distribution  with  an  unknown  shape  parameter,  one  can  select 
almost  optimal  values  of  u^  and  (and  consequently  u^)  according 
to  the  shape  suggested  by  quantile-box  plots  or  other  graphical 
techniques.  If  the  shape  is  "super  exponential"  (i.e.  very  skewed 
right  and  no  mode),  then  select  u^  =  1.0  and  u^  in  the  range  (.01,  .63) 
(a  longer  tail  implies  a  larger  value  of  u^).  If  the  data  seem  to 
be  exponential  (i.e.  skewed  right  and  no  mode),  choose  =  .97  and 


A 


% 
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=  .0064.  If  the  data  are  unimodal  and  skewed  right,  values  of 
in  the  range  (.6,  .95)  (a  longer  tail  implies  a  larger  value  of  u^) 
and  =>  .0002  will  be  almost  optimal.  If  the  data  seem  almost 
symmetric,  select  u^  =  .55  and  u^  =  .0002.  If  the  data  are  unimodal 
and  skewed  left,  values  of  =  .54  and  =  .0002  will  be  almost 
optimal . 

The  estimation  of  y  can  also  be  done  iteratively.  An  estimate 
Y  based  on  one  set  of  (u^,  u^,  u^)  values  may  suggest  better  values 
of  (u^  u2,  u3>. 

In  Section  6.2  we  illustrate  the  use  of  the  estimator  y  of  y 

P 

for  ten  samples  of  data  representing  the  tolerance  of  green  sunfish 
to  thermal  pollution. 

3.3  A  Goodness-of-Fit  Approach  for  Determining  Distributional  Shape 

In  this  section  we  describe  how  to  apply  the  one  population 
goodness-of-f it  (GOF)  procedures  of  Section  3.1  to  the  estimation 
of  the  common  value  of  y  in  the  k  population  model 

Q1(u)  =  ui  +  oi  [Qo*(u)]Y  ,  i  =  1,  ....  k  , 

where  Q  *(•)  is  assumed  known  and  y  is  an  unknown  shape  parameter, 

and  also  to  the  identification  of  Q  in  the  model 

o 

Q,(u)  =  p.  +  o,  Q  (u)  ,  i  =  1,  ...,  k  . 
i  i  i  o 

Estimation  of  y  using  GOF.  The  proposed  GOF  procedure  consists  of 


three  parts: 
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1) 

2) 


Form  a  grid  (y  , ,  ...»  y  )  of  potential  values  for  y. 
ol  om 

For  each  value  of  yQ  in  the  grid,  form  estimates  ^(y  ) 

and  a  (y  )  of  p,  and  o,  using  linear  combinations  of  order 
to  1  i 

statistics  (see  Section  4.2  below).  For  the  ith  sample 

form  a  transformed  sample  0  (u)  defined  by 

1,Yo 


Q ,  (u . )  -  u  (y  )  ~]  ^Yo 

_i _ 1_ _ L_o  . 

o  (y  )  J 


u  =  (j  -  •5)/ni  ,  j  =  1,  ....  ni,  i  =  1, 

Next,  pool  the  k  transformed  samples. 

3)  The  hypothesis 


y 

H  :  Q.  (u)  =  P.  +  o  (Q  *(u»  ° 
o  l  t  i  o 


k 


can  be  written  as 


H  : 
o 


=  Q  *(u) 
o 


Under  this  hypothesis  for  a  specified  value  of  y  ,  we  can 

consider  the  pooled  transformed  sample  as  a  random  sample 
k 

of  size  n  =  1  n  from  a  population  with  quantile  function 
1  =  1  1 

Q  *(u) .  Select  the  best  value  of  v  to  be  the  value  that 
o  o 

gives  the  best  agreement  of  the  pooled  transformed  sample 
and  Qq*(u)  according  to  the  GOF  criteria  of  Section  3.1. 


Considerations  in  determining  the  grid  {y  ,  ...,  y  }  are- 

° 1  om 
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1)  Knowledge  of  the  qualitative  properties  of  Q^(u)(e.g. 
symmetry,  right  skewness,  or  left  skewness,  and  tail 
behavior)  obtained  from  quantile-box  plots  can  help  one 
select  a  narrow  grid  of  yQ  values. 

2)  Computationally  the  procedure  is  expensive. 

3)  If  one  specifies  Qq*(u)  =  -log(l  -  u)  fi.e.  Q^(u)  is  in 
the  Weibull  family),  tables  of  optimal  spacings  and  coef¬ 
ficients  for  the  estimation  of  and  o  using  linear 
combination  of  order  statistics  are  available  for  limited 
values  of  y.  Programs  to  compute  the  optimal  spacings  and 
coefficients  for  a  wide  range  of  y  values  should  be  made 
available. 
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is  equivalent  to 


H  : 
o 


Qt(u)  -  p  t 

O 

1 


Qo(u) 


Thus  under  this  hypothesis  for  a  specified  Q^,  we  can 

consider  the  pooled  transformed  sample  as  a  random  sample 
k 

of  size  n  =  1.  n  from  a  population  with  quantile  function 
i=  l  ’ 

Q  (u) .  Select  the  best  specification  of  Q  as  the  one  that 
o  r  o 

gives  the  best  agreement  of  the  pooled  transformed  sample 
and  Qq(u)  according  to  the  criteria  of  Section  3.1. 

Examination  of  how  the  misspecif icat ion  of  y  affects  the  estimators 

o 

p  and  o  for  the  one  population  case  is  examined  analytically  in 
Section  4.3.  In  Sections  6.1  and  6.2  we  illustrate  the  techniques 
of  identifying  Qq  using  the  professors'  salary  data  of  Hogg  (1975) 
and  the  green  sunfish  data  of  Matis  and  Wehrly  (1979). 
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4.  ESTIMATION  OF  LOCATION  AND  SCALE  PARAMETERS 

The  estimation  of  location  and  scale  parameters  is  the  second 
stage  in  the  k-sample  quantile  regression  procedure  as  we  perceive 
it.  Existing  techniques  for  the  estimation  of  location  and  scale 
parameters  in  the  one  population  case  are  also  appropriate  for  the 
estimation  of  location  and  scale  parameters  when  there  are  k  popul¬ 
ations  satisfying 

Qi<u)  =  pi  +  o  Qq(u)  ,  i  =  1 . k  , 

where  p^  and  are  the  location  and  scale  parameters  respectively 
of  the  ith  population. 

The  location  and  scale  parameter  model  for  one  population  can 
be  written 

F (x)  =  F 

o\  o  / 

or 

Q(u)  =  U  +  cQ  (u) 
o 

where  F  and  Q  are  completely  specified  and  p  and  o  are  unknown 
o  o 

location  and  scale  parameters,  respect ive ly .  One  would  like 
estimators  of  p  and  o  which  are  statistically  efficient  and  rela¬ 
tively  simple  to  compute. 

Lst imators  p  and  o  of  p  and  o  based  on  linear  combinations  of 
order  statistics  (LCOS)  are  defined  by 
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P  =  1  a.Q(u.) 


j  =  l 


t' y 


a  =  E  b . Q(u . ) 


j=l 


J  J 


where  Q(u)  is  the  empirical  quantile  function,  r  is  the  number  of 

values  of  Q(u)  (or  the  number  of  order  statistics)  used,  and  a^ ,  b., 

j  =  1,  r,  are  specified  constants.  Two  approaches  to  the  choice 

of  r,  the  a.'s  and  b.'s,  and  u. 's  are  discussed  in  the  next  two  scc- 
J  J  J 

tions.  The  first  approach  (Section  4.1)  is  due  to  Ogawa  (1951)  and 
Hassanein  (1971,  1972)  and  the  second  (Section  4.2)  is  due  to  Eubank 
(1979).  Section  4.3  investigates  the  estimation  of  p  and  a  for  the 
Weibull  distribution  when  the  shape  parameter  y  is  misspecif ied. 


4.1  Optimal  Linear  Combinations  of  Order  Statistics 


In  this  section  we  present  the  general  work  of  Ogawa  (1951) 
and  the  work  of  Hassanein  (1971,  1972)  dealing  with  the  selection 
of  optimal  linear  combinations  of  order  statistics  for  the  simulta¬ 
neous  estimation  of  p  and  o. 

Using  the  model 


Q(u)  -  p  + o Q  (u) 
o 

and  recalling  Theorem  2.1.4  regarding  the  asymptotic  distribution  of 
Q(u),  asymptotically  Q(uj),  ....  QOi^)  satisfy  the  conditions  required 


for  the  application  of  the  Causs-Markov  Theorem.  Thus  generalized 
least  squares  may  be  used  to  obtain  asymptotically  best  linear 
unbiased  estimators  (ABLUE's)  of  u  and/or  o.  Ogawa  (1951,  see 
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°  “  (K11  K02  ~  K12  K01)(/  A  ’ 


(4.1.5) 


Notice  that  these  are  just  the  generalized  least  squares  estimators 
of  w  and  o  given  by 

(')=  (xlc“1x)”1x1c_1y  , 

'•O' 


where 


T  -1 

X  C  X  = 


xtc-ly 


Ku 

K 

12 

*12 

K22 

a  /Koi 

) 

U)2- 

1 

where  C  is  given  by  (2.1.3). 

For  the  simultaneous  estimation  of  u  and  a  Ogawa  (1951)  defines 
the  ARF.  by 

.  -  I  i-1(g) | 

ARE  ( ti ,  o )  =  - — r - -  - — - z —  . 

Var (u)Var(o)  -  Cov2(y,  o) 

where 


e  =  (u,  o)T 


?f*  (x)\2 

E 

;  ({-w) 

x  V  f(x)/ 

H 

--i 

1(0)  - 

\  f(x)/ 

_ 

E 

2 

K 

I  x  S' .(X)  V 

1 

\  f (x)7 

_ 

F(X)'  . 

44 


and 


f  Q  <«)> 


o  o 


o  o 


<W‘>-  VU>WU>> 


<foQo(u),  Q„(u)f_Q_(u)>  'Q_(u)f,Q(.D  ,  Q_(u)f_Q_(^)> 


o  o 


O  0  0 

(4.1.6) 


<f(u),  g(u)>  =  /  f  '(u)  g  ’(u)  du 


Examination  of  the  equations  for  the  estimators  and  their  ARE's 
reveals  that  the  equations  are  functions  of  the  spacings  u^,  .... 
ur  •  Thus  the  problem  reduces  to  finding  a  set  of  optimal  spacings 
which  maxim ize  the  ARE  of  the  estimators.  For  certain  distributions 
the  expressions  for  the  ARE's  are  quite  complicated  and  numerical 
methods  have  to  be  used  to  find  optimal  or  near  optimal  spacings. 

For  a  given  distribution  the  results  are  usually  expressed  as  tables 
of  optimal  spacings  u^ ,  u^  and  the  corresponding  coefficients 

a, »  •••»  a  »  b  ,  ...,  b  for  the  ABLUE's  for  various  values  of  r. 

Hassanein  (1971)  uses  this  procedure  to  find  optimal  spacings 
and  coefficients  for  the  simultaneous  estimation  of  p  and  o  for  the 
Weibull  distribution.  The  tables  he  provides  are  a  function  of  the 
shape  parameter,  c  =  1/y  ,  and  he  provides  spacings  and  coefficients 
for  r  =  2,  4,  6  order  statistics.  The  values  of  c  he  considers  are 
c  =  3(1)10(5)20  .  Subroutine  QTOLSW  uses  Hassanein's  tables  for 
r  =  6  values  to  compute  estimates  of  p  and  o  for  any  of  the  above 
specified  values  of  c. 

Hassanein  (1972)  considers  the  problem  of  selecting  optimal 
spacings  and  coefficients  for  the  simultaneous  estimation  of  the 


location  and  scale  parameters  of  the  extreme  value  distribution. 
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He  provides  optimal  spacings  for  r  =  1(1)10  order  statistics.  Let 
us  recall  the  property  that  if  X  has  a  Weibull  distribution  with 
p  =  0,  then  Y  =  log  X  has  an  extreme  value  distribution  with  the 
location  parameter  p'  =  log  o  and  o'  =  y  where  o  and  y  are  the  scale 
and  shape  parameters  of  the  Weibull  distribution.  Thus  one  can  use 
the  optimal  spacings  and  coefficients  for  the  extreme  value  distri¬ 
bution  to  estimate  the  scale  and  shape  parameters  of  the  Weibull 
distribution  as  long  as  the  location  parameter  p  is  known. 

There  is  an  extensive  literature  on  the  use  of  linear  combin¬ 
ations  of  order  statistics  to  estimate  location  and  scale  parameters 
for  many  common  distributions.  The  approach  adopted  by  Ogawa, 

Hassanein,  and  others  centers  on  maximizing  the  ARE  of  the  estimators. 

In  the  next  section  we  present  another  approach  to  the  selection  of  a 
set  of  spacings  and  coefficients  for  optimal  location  and  scale 
parameter  estimation. 

4.2  Asymptotically  Optimal  Linear  Combinations  of  Order  Statistics 

In  this  section  we  discuss  the  approach  taken  by  Eubank  (1979) 
for  the  selection  of  asymptotically  optimal  LC OS  for  the  simultaneous 
estimation  of  p  and  o.  Eubank  formulates  the  problem  within  the 
framework  of  continuous  parameter  time  series  regression.  Using 
Theorem  2.1.1  and  the  model 

Q(u)  -  p  +  o  Q  (»)  , 
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we  have 

*/n/ o  f  Q  (u)[Q(u)  -  p  -  o  Q  (u) J  ^  B(u)  (4.2.1) 
o  o  o 

where  (B(u),  u  e[0,  1]}  is  a  Brownian  bridge  process. 

Then  we  can  write  a  regression  model 

f  Q  (u)Q(u)  =  pf  Q  (u)  +  cQ  (u)f'Q  (u)  +  o  B(u)  (4.2.2) 
oo  oo  ooo  B 

where  =  o//n  is  estimated  as  a  free  parameter  and  is  not  constrained 
to  be  related  to  a.  Eubank  restates  the  problem  of  selecting  a  set  of 
spacings  for  the  estimation  of  p  and  o  as  that  of  selecting  an  optimal 
design  for  a  Brownian  bridge  process. 

Definition  4.2.1:  An  r  point  design  for  a  Brownian  bridge  process, 

and  consequently  for  {f  Q  (u)Q(u),  0  <  u  <  1)  ,  is  an  r-tuple 

o  o  —  — 

{u,,  •••,  u  }  with  0  <  u  <  ...  <  u  <1.  Denote  by  D  the  set  of 
1  r  1  r  1  x 

all  such  r  point  designs. 

Definition  4.2.2:  For  TeD  ,  let  0_  denote  the  best  linear  unbiased 
- r  _T 

estimator  (BLUE)  of  0  =  (p,  a)  based  on  observations  taken  according 
to  T.  Let  O  denote  the  estimator  of  0  obtained  using  observations 
over  all  of  [ 0 ,  1  ] . 

Definition  4.2.1:  A  design  sequence  (T  1  T^cD^  ,  is  asymptotically 

optimal  for  estimating  G  if 
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I  V.-tr  l(0Tr)|  -  |  V.-«  r  '  (g)  j 

r-*»  inf  [Var  ^(0  )|  -  fvar  ^(0)| 

TeD  -1 

r 


Theorem  4.2.1  (Eubank  1979):  Suppose  fQQQ(u)  and  Qq (u) f0QQ (u)  have 
the  representations: 


1 

f  Q  (u)  =  -  /  (f  Q  (t)"  K_  (u ,  t)dt  , 
o  o  0  o  o  B 


1 

Q  (u)f  Q  (u)  =  -  /  (Q  (t)f  Q  (t))"  K  (u,  t)dt 
ooo  Qooo  B 


where 


Kg(u,  t)  =  min(u,  t)  -  u  t 


and  (g(t))  denotes  d  g(t)/d 


«|»(u)  =  -( (f  Q  (u))"  ,  (Q  (u)f  Q  (u))"  )T 

_  o  O  0  0  0 


Then  the  density 


h  (u)  = 


IHu)T  T-]  (g)  i»(u) ) 


X 


/1[i|/(u)T  I  1(0)  !/'(u)]3du 


where  0  =  (p,  o)  and  1(0)  is  defined  by  (4.1.6)  generates  asymptoti¬ 
cally  optimal  designs  for  the  simultaneous  estimation  of  p  and  o. 
(see  remark  2  on  p.  43) 


Remarks  on  Theorem  4.2.1: 


l)  The  asymptotic  optimality  of  the  design  means  that  as  r, 
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the  number  of  spacings,  grows  large,  then  the  spacings 
generated  by  Theorem  4.2.1  lead  to  estimators  with  approx¬ 
imately  the  same  efficiency  as  estimators  based  on  the 
optimal  set  of  r  spacings. 

2)  Optimal  designs  are  those  that  minimize  the  generalized 
variance  |var  (p^  ,  oT)  |  . 

3)  The  density  h  generates  the  asymptotically  optimal  design 

oo 

sequence  {T^_ } r _ where 


T 

r 


x'1  <rll> . «_1  <rJl» 


and 


H (u)  =  /U  h(t)  dt  . 
0 


Eubank  (1979)  supplies  general  formulae  for  the  coefficients  a^  and 

b.  for  the  estimation  of  p  and  a  using  the  asymptotically  optimal 
J 

spacings  to  yield  estimators 


P  =  E  a  QOT^j;  )) 

j=l  J 

J* 

o  =  I  b  OOf1^)) 

j=l  3 


where 


a.  =  [K22<h)Wj(j,h)  -  KJ2(h)Wo(j,  h)]  /  A(h) 

b.  =  [Kn(h)W  (j  ,10  -  K1?(h)W  <j,h)]  /  A (h)  , 

j  11  o  iz  p 
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K^(h)  is  the  same  as  K^.  given  hy  (4.1.1),  (4.1.2)  and  (4.1.3)  with 
replaced  by  H  \j/(r  +  1)), 


A(h)  =  Ki;L(h)K22(h)  -  [K  (h)]4  , 


W  (j,h) 

P 


Ktl(h) 


WH~  (ril))~  W""  <5l» 


f  q  or  (A)) 

w  (j,h)  =  - —  x 

°  K  (h) 


fo<lo  (ll~ ’  (ril'  ><!o  (H~‘  <^1’  >-f  o^o  C'1  < £l>  K  C'1  <£l> > 

""‘‘ril’  - 


""'‘f+l1  -  'rl<ril> 


While  the  approach  is  direct  and  once  H(u)  is  computed  asymp¬ 
totically  optimal  spacings  can  easilv  bo  found,  many  distributions 
do  not  satisfy  the  required  representation  for  foQo(u)  and 

Q  (u)f  Q  (u) .  (see  Eubank  1979,  p.  116) 
o  o  o 

Eubank  (1979)  gives  tables  «.> f  asymptot Ically  optimal  spacings 
and  coefficients  for  the  si  nultaneous  estimation  of  p  and  o  based  on 
t  =  2,  7,  9  order  statistics  for  the  tiormal  and  logistic  distributions, 
It  has  been  suggested  (Eubank,  personal  communication  1980)  that 


asymptotically  optimal  spacings  for  the  sirultaneous  estimation  of 
P  and  0  can  be  generated  for  the  Weibull  distribution  for  certain 
values  of  the  shape  parameter  Y. 

4.3  Estimation  of  p  and  a  when  Y  is  Misspecified 


In  this  section  we  examine  analytically  how  the  misspecif ica- 
tion  of  Y  affects  estimators  of  p  and  o  based  on  LCOS  for  the  Weibull 
distribution.  Using  est imators  M (Y q )  and  o(y  )  based  on  a  range  of 
specified  values,  y  >  of  the  true  value,  y,  of  the  shape  parameter, 
we  have: 


Bias(p(yo>)  =  E(y(yo))  -  p 


=  Z  a  (-log(l-u  ))Y  , 
j  =  l  J  j 


Bias  (a (Y  ))  =  E(o(Y  ))  -  o 

o  o 


r  Y 

“  £  b  (-log(l-u  ))  -  1  , 

j=l  J  J 

Var (y (yq) )  =  j  i  aiaj^°v(Q(u1) ’ 

=  E  aiaitmln(U1>Ui)-UiU11 _ 

n  j  i(l-ui)(l-uj)[log(l-ui)log(l-uj)]  1_Y 


Var  (o  (y  ))  = 
o 


T.  I 


J  i  i  j 

I2  E  I 

n  j  i 


b,b ,Cov(Q(u^) ,  Q(uj)) 


(1-u^)  (i~uj)  [logd-u^)  log(l-Uj)  ]1“Y 
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MSE(M(Y  ))=  E(m(y  )-  M)2 

O  o 

=  (Bias(u(Y  ))]2  +  Var(u(y  ))  , 

o  o 

MSE (o (y  ))=  E(o(>  )-  >  )2 
o  o 

=  [BiasO’Cy  ))]2  +  Var  (n  )) 

o  o 

We  calculate  the  values  of  each  of  the  above  properties  of 

/\  A 

li  (y  )  and  o(v  )  using  the  values  of  y  and  >  :  .  31),  .  25,  .20, 

o  o  o 

.167,  .143,  .125,  .111,  .10,  .067  and  .05  .  The  coefficients  {a  } 

j 

and  {bj }  and  optimal  values  { u j }  are  obtained  from  tables  given  by 
Hassanein  (1971)  using  r  =  6  order  statistics  and  the  specified  Yq 
value.  The  MSE  (mean  squared  error)  is  computed  for  samples  of  size 
n  =  20  and  50. 

The  results  are  summarized  in  Tables  4.1,  4.2,  4.3,  and  4.4. 

The  first  entry  in  each  cell  of  the  various  tables  is  for  \i  (y  ) 

o 

A 

and  the  second  entry  is  for  a(y  ).  Figures  4. A,  4.B,  4.C  and  4.D 

o 

represent  plots  of  the  properties  of  the  estimators  vs.  the  specified 
value  of  Yq.  Each  curve  on  the  plots  represents  a  distinct  value 
of  y  as  indicated  in  the  key.  Plotting  the  curves  for  all  the  values 
of  y  on  the  same  set  of  axes  facilitates  comparison  of  the  properties 
for  different  misspecifications. 

Remarks  on  Table  4.1  and  Figure  4. A: 

Table  4.1  and  Figure  4. A  present  the  results  of  a  bias  study  of 

the  estimators  ii(v  )  and  rj(y  ).  The  following  remarks  can  be  made: 
o  o 
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Table  4.1  Bias  of  p(Yq).  o(yo) 


y 

.333 

.250 

.200 

.167 

.143 

..125 

.111 

.100 

.067 

.050 

.333 

.000 

-.239 

-.519 

-.814 

-1.117 

-1.424 

-1.734 

-2.046 

-3.613 

-5.201 

-.001 

.264 

.557 

.360 

1.168 

1.479 

1.792 

2.105 

3.684 

5.270 

.250 

.163 

.000 

-.204 

-.422 

-.648 

-.877 

-1.109 

-1.342 

-2.522 

-3.710 

-.185 

.000 

.215 

.440 

.670 

.902 

1.136 

1.372 

2.556 

3.747 

.200 

.279 

.159 

.000 

-.173 

-.353 

-.536 

-.722 

-.909 

-1.855 

-2.808 

-.300 

-.169 

.000 

.179 

.363 

.548 

.736 

.925 

1.875 

2.830 

.167 

.367 

.274 

.144 

.000 

-.149 

-.302 

-.457 

-.613 

-1 .404 

-2.201 

-.393 

-.288 

-.149 

.000 

.153 

.308 

.465 

.662 

1.416 

2.215 

.143 

.430 

3  pn 

’27 

-.001 

-.132 

- .  2C5 

.  332 

- 1.375 

-1 .7C3 

-.466 

-.376 

-.257 

-.130 

.001 

.134 

.269 

.404 

1.088 

1.775 

.125 

.490 

.429 

.334 

.227 

.115 

.000 

-.116 

-.234 

-.830 

-1.431 

-.520 

-.445 

-.342 

-.231 

-.117 

.000 

.118 

.236 

.835 

1.438 

.III 

.535 

.484 

.400 

.305 

.206 

.  104 

.000 

-.105 

-.635 

-1.171 

-.565 

-.499 

-.409 

-.311 

-.209 

-.105 

.000 

-.106 

.639 

1.176 

.100 

.573 

.529 

.454 

.369 

.280 

.138 

.094 

.000 

-.479 

-.961 

-.601 

-.544 

-.463 

-.375 

-.283 

-.190 

-.095 

.000 

.481 

.965 

.067 

.697 

.672 

.624 

.567 

.508 

.916 

.  384 

.321 

.000 

-.324 

-.319 

-.685 

-.632 

-.573 

-.512 

-.450 

- .  386 

-.323 

.000 

.325 

.050 

.765 

.  749 

.713 

.671 

.626 

.530 

.533 

.435 

.244 

.000 

-.783  - .  759  -.720 


-.676  -.630 


-  .58  3  -.036  - .488  -.245 


.000 


ion  of 
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1)  Bias  (P(Y  ))  and  Bias  (o(y  ))  for  y  =  .333  get  large  in 

o  o 

magnitude  as  y gets  small.  A  consequence  of  this  is  that 
if  the  data  is  almost  symmetric  and  one  specifies  the  data 

a 

to  be  skewed  left  (y  small),  then  p(y  )  may  seriously 

o  o 

underestimate  p  and  a(y  )  may  seriously  overestimate  o. 

o 

2)  For  data  that  is  skewed  left,  the  risk  of  a  seriously  biased 

estimator  when  y  is  misspecified  is  not  as  great  as  in  (1). 
o 

A 

When  y  =  .05  and  one  specifies  y^  =  .333,  Bias (p (. 333) )=  .8 

A 

and  Bias(a(.333))=-„8  . 

A  A 

3)  Bias(p(y  )  )=>-Bias  (o  (y  ))  for  all  values  of  y. 

o  o 

4)  One  might  wish  to  approximate  the  bias  curves  as  a  function 


of  y  and  yQ.  While  this  would  be  useful  in  general,  exam- 

A  A 

ination  of  the  general  formulae  for  V'(yo)  and  o(yQ)  given 
by  4.1.4  and  4.1.5  do  not  offer  much  promise  of  this. 


Remarks  on  Table  4.2  and  Figure  4.B 

Table  4.2  and  Figure  4.B  present  the  results  of  a  study  of 

A  A 

Var(p(>o))  and  Var(o(yo)).  The  following  remarks  can  be  made: 

1)  It  should  be  noted  that  the  table  and  figure  give 

A  A 

nVar(u(y  ))  and  nVar(o(y  ))  since  the  variance  of  each 
o  o 

estimator  is  a  function  of  the  sample  size. 

2)  In  general  Var(p(v  ))  and  Var(o(y  ))  remain  fairly  constant 

o  o 

for  Y^  equal  to  .25  or  .333  regardless  of  the  true  value  y. 

3)  Var (p (y  ))  =  Var  (<j(y  ) ) . 

o  o 


Table  4.2  Variance  of  u (y  )»  o(v  ) 

o  o 


Y 

.333 

.250 

.200 

.16/ 

.143 

.125 

.111 

.100 

.067 

.050 

.333 

.358 

.583 

.  968 

1.502 

2.  184 

3.014 

3.994 

5.123 

13.024 

24 . 688 

.423 

.749 

1.23/ 

1.877 

2.667 

3.607 

4 . 696 

5.936 

14.391 

26.612 

.250 

.401 

.4/2 

.633 

.9  0 

1.354 

1  .806 

7.335 

2.940 

7.112 

13.192 

.360 

.482 

.733 

1  .068 

1.432 

1.974 

2.542 

3.186 

7.555 

13.834 

.200 

.471 

.400 

.530 

.773 

.967 

1  . 260 

1.601 

1.990 

4.646 

8.482 

.343 

.366 

.51/ 

.728 

.989 

1.300 

1.658 

2.064 

4.304 

8.724 

.167 

.421 

.343 

.428 

.564 

.738 

.947 

1.189 

1.465 

3.332 

6.013 

.  Ill 

.297 

.396 

.542 

.724 

.94) 

1.192 

1.475 

3.382 

6. 102 

.143 

.40/ 

.297 

.355 

.466 

.587 

.745 

.927 

1.135 

2.534 

4.533 

.  316 

.249 

.317 

.426 

.561 

.723 

.909 

1.121 

2.538 

4.556 

.125 

.306 

.258 

.298 

.377 

.479 

.602 

.  744 

.906 

1.995 

3.544 

.293 

.213 

.261 

.3  13 

.448 

.574 

.719 

.  882 

1.979 

3.537 

.111 

.362 

.226 

.255 

.3)7 

.  399 

.498 

.613 

.743 

1.618 

2.859 

.273 

.104 

.270 

.235 

.  .369 

.469 

.585 

.  716 

1.594 

2.838 

.100 

.33/ 

.200 

.270 

.2/0 

.338 

.420 

.514 

.672 

1.341 

2.360 

.  75') 

.161 

.  1  i'3 

.240 

.  309 

.  391 

.  406 

.591 

1.313 

2.332 

.06/ 

,731 

.116 

.120 

.1  12 

.173 

.21? 

.257 

.  30/ 

.645 

1 .  120 

.1/3 

.  09? 

.039 

.  173 

.  154 

.19) 

.737 

.287 

.622 

1.094 

.050 

163 

.0/5 

.0/6 

.037 

,  105 

.  179 

.154 

.  183 

.330 

.655 

.  176 

.069 

.06? 

.0/5 

.093 

.115 

.141 

.  170 

.363 

.636 
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4)  For  y  =  .333  the  variance  is  very  large  when  yo  is  mis- 
specified.  When  y  =  .05, the  variance  when  yq  is  misspeq- 
ified  is  not  significantly  different  from  when  yq  is 
correctly  specified. 

Remarks  on  Tables  4.3  and  4.4  and  Figures  4.C  and  4.D 

Tables  4.3  and  Figure  4.C  present  the  results  of  a  study  of 
MSE(p(yo))  and  MSE(o(yq))  i°r  the  sample  size  n  =  20.  Table  4.4 
and  Figure  4.D  present  analogous  results  for  the  sample  size  n  =  100. 
The  following  remarks  can  be  made: 

1)  For  small  sizes  the  variance  term  will  dominate  the  bias 
terra  when  computing  MSE.  As  sample  size  increases,  the 
effect  decreases. 

2)  The  curves  for  MSE  look  surprisingly  like  the  curves  for 

the  variance  of  the  estimators.  Examination  of  Table  4.1 
*  2 

reveals  that  lBias(u(Y  ))]  is  approximately  equal  to 

nVar(u(Y  ))  and  [Bias(o(Y  ))]  is  approximately  equal  lo 
o  o 

* 

nVar  (o  (yq)). 

3)  Since  we  arc  comparing  biased  and  unbiased  estimators  of 
\i  and  o,  it  is  reasonable  to  compare  the  MSE  of  the 


estimators . 
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A  * 

Table  4.3  MSE  of  g (y  ),  o(y  )  n  =  20 

o  o 


y 


o 


y 

.333 

.250 

.200 

.167 

.143 

;  125 

.111 

.100 

.067 

.050 

,333 

.004 

.063 

.279 

.678 

1.269 

2.058 

3.046 

4.236 

13.222 

27.293 

.004 

.077 

.322 

.758 

1.390 

2.222 

3.256 

4.492 

13.717 

28.035 

,250 

.031 

.005 

.048 

.188 

.433 

.787 

1.253 

1.831 

6.430 

13.894 

.038 

.005 

.053 

.204 

.463 

.833 

1.317 

1.913 

6.610 

14.176 

200 

.032 

.029 

.005 

.037 

.134 

.300 

.537 

.846 

3.486 

7.969 

.098 

.032 

.005 

.039 

.141 

.314 

.558 

.875 

3.563 

8.097 

167 

.139 

.079 

.025 

.006 

.030 

.101 

.221 

.391 

2.003 

4.903 

.162 

.086 

.026 

.005 

.031 

.104 

.228 

.402 

2.040 

4.969 

143 

.1°? 

.1  T» 

.021 

.006 

.nnn 

171 

1.190 

3.161 

.220 

.143 

.069 

.021 

.006 

.025 

.081 

.175 

1.208 

3.197 

125 

.244 

.187 

.114 

.055 

.018 

.006 

.021 

.064 

.708 

2.083 

.274 

.200 

.120 

.057 

.018 

.006 

.021 

.065 

.718 

2.104 

111 

.290 

.236 

.162 

.093 

.046 

.016 

.006 

.018 

.420 

1.400 

.321 

.251 

.169 

.099 

.047 

.016 

.006 

.018 

.424 

1.412 

100 

.332 

.282 

.208 

.139 

.082 

.039 

.014 

.006 

.242 

.948 

.364 

.298 

.217 

.143 

.083 

.040 

.014 

.006 

.244 

.955 

067 

.488 

.453 

.390 

.323 

.259 

.201 

.150 

.106 

.006 

.116 

.519 

.470 

.401 

.330 

.264 

.204 

.152 

.107 

.006 

.117 

050 

.587 

.561 

.50') 

.451 

.393 

.338 

.286 

.237 

.063 

,007 

.614 

.577 

.519 

.458 

.398 

.341 

.288 

.239 

.064 

.006 
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Table  4.4  MSE  of  u(y  ),  o(y  )  n  =  100 

C)  o 


Y 

.333 

.250 

.200 

.167 

.143 

.125 

.111 

.100 

.067 

.050 

.333 

.018 

.086 

.318 

.738 

1.356 

7.179 

3.706 

4.441 

13.74? 

28.281 

.021 

.107 

.372 

.833 

1.497 

2.366 

3.444 

4.779 

14.793 

79.099 

.250 

.047 

.024 

.076 

.227 

.487 

.859 

1.346 

1.949 

6.715 

14.422 

.052 

.024 

.033 

.247 

.522 

.912 

1.419 

2.040 

6.912 

14.729 

,Z00 

.099 

.045 

.027 

.066 

.173 

.351 

.601 

.926 

3.672 

8.308 

.112 

.047 

.026 

.068 

.181 

.366 

.675 

.958 

3.755 

8.446 

167 

.155 

.092 

.042 

.028 

.059 

.139 

.268 

.449 

2.136 

5.143 

.175 

.098 

.042 

.027 

.060 

.142 

.276 

.461 

2.175 

5.213 

143 

.209 

.145 

.080 

.039 

.029 

.055 

.117 

.216 

1.291 

3.342 

.233 

.153 

.082 

.038 

.028 

.054 

.118 

.719 

1.310 

3.380 

,125 

.259 

.197 

.126 

.070 

.037 

.030 

.051 

.100 

.788 

2.225 

.286 

.208 

.130 

.071 

.036 

.029 

.050 

.110 

.797 

2.245 

111 

.305 

.245 

.173 

.109 

.062 

.036 

.031 

.048 

.484 

1.514 

.333 

.259 

.178 

.111 

.062 

.034 

.029 

.047 

.488 

1.525 

100 

.354 

.290 

.217 

.150 

.095 

.056 

.035 

.031 

.296 

1.042 

.374 

.304 

.724 

.153 

.096 

.056 

.033 

.030 

.297 

1.048 

067 

.497 

.458 

.395 

.379 

.266 

.210 

.160 

.118 

.032 

.161 

.526 

.474 

.405 

.315 

.270 

.717 

.161 

.118 

.031 

.160 

050 

.593 

.564 

.517 

.454 

.397 

.343 

.792 

.745 

.079 

.033 

.619 

.580 

.521 

.461 

.402 

.346 

.294 

.246 

.078 

.032 

k 
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4)  When  Y  =  .333,  MSE(m(y  ))  and  MSE(o(y  ))  is  very  large 

oo 

* 

when  Y  is  misspecif led.  However  when  y  =  .05,  MSE(p(y  )) 
o  o 

and  MSE(o(y  ))  do  not  change  significantly  when  v  is 
o  o 

considerably  misspecif led. 

Based  on  consideration  of  the  bias,  variance,  and  MSE  of  each 
estimator,  the  worst  situation  is  to  have  data  that  is  almost  symmetric 
and  to  misspecify  it  as  very  skewed  left.  The  consequences  of  misspec- 
ification  are  not  severe  when  the  data  is  skewed  left.  One  will  not 
do  too  badly  if  he  uses  estimators  of  p  and  o  based  on  specifying 
Yq  =  .333,  .25,  or  .20  regardless  of  whether  the  distribution  is 
skewed  left  or  symmetric. 

The  techniques  of  determining  y  investigated  in  Sections  3.2 
and  3.3  seem  to  lead  to  a  specification  of  Y  that  is  in  the  range  of 
the  true  value  of  y.  Thus  estimators  of  p  and  a  based  on  such  a 
specification  should  yield  reliable  estimates  of  p  and  o. 
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5.  QUANTILE  REGRESSION  AND  COMPARISON  FOR  K  SAMPLES 
In  the  previous  sections  we  described  techniques  to 

1)  identify  Q  or  estimate  v  (Section  3), 

o 

2)  estimate  u  and  a  (Section  4) 

in  the  model 


Q(u)  =  p  +  o  Qq(u) 

The  results  of  Section  3  have  been  generalized  to  the  k  sample  problem. 
The  results  of  Section  4  generalize  directly  to  the  estimation  of 
location  and  scale  parameters  using  independent  samples  from  k  popula¬ 
tions. 

Let  us  restate  the  model  for  k  sample  quantile  regression.  We 

assume 


Qi(u)  =  ^  Qq(u)  ,  i  =  1,  ...  k  , 

where 


“i =  %  +  Vi  ’ 


o  ,  =  a  +  fj  X  .  , 

i  o  oi 


Q^(u)  is  the  quantile  function  of  the  ith  population,  Qq(u)  is  an 

V 

unknown  quantile  function  or  Q  (u)  =  (Q  *(u))  where  Q  *(u)  is 

o  o  o 

completely  specified  and  y  is  an  unknown  shape  parameter  common  to 


the  k  populations,  and  X  is  a  numerical  characteristic  of  the  ith 


i 


M 


populat  ion.  We  further  assume  X  <  ...  <X  for  convenience. 

.1  —  —  k 

A  commonly  assumed  quantile  repression  model  is 
Q^u)  =  A(u)  +  B(u)  X 

where  A(u)  and  B(u)  are  unknown  constants  which  depend  on  u.  We 
desire  estimators  of  A(u)  and  B(u)  for  a  specified  value  of  u. 

Section  5.1  discusses  the  contributions  of  Brown  and  Mood  (1950) 
and  Hogg  (1975)  in  the  area  of  nonparametr ic  quantile  regression  and 
the  work  of  Griffiths  and  Willcox  (1978)  in  parametric  quantile 
regression.  We  also  show  the  equivalence  of  the  model  of  (5.1.1)  and 
(5.1.2)  to  the  model  of  (5.1.3)  citing  the  work  of  Griffiths  and  Willcox 

(1978). 

In  Section  5.2  the  generalized  least  squares  technique  is  used 
to  estimate  a  ,  0^,  a^,  and  0^  .  We  state  pertinent  hypotheses  about 
the  regression  parameters  and  provide  test  statistics  for  the  hypo¬ 
theses  based  on  the  asymptotic  distribution  of  the  estimated  parameters. 

Section  5.3  discusses  how  the  regression  technique  of  Section  5.2 
can  be  applied  to  the  k-sample  comparison  problem  under  certain 
restrictions.  Test  statistics  for  pertinent  hypotheses  about  the 
location  and/or  scale  parameters  of  the  k  populations  are  provided. 

5.1  K  Sample  Quantile  Regression 

In  this  section  we  discuss  the  nonparametr ic  technique  of  Hogg 
(1975)  and  the  parametric  approach  of  Griffiths  and  Willcox  (1978)  to 
estimate  k  sample  quantile  (percent  i  le)  relationships.  The  data  for 
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the  regression  problem  may  consist  of  k  random  samples  {Y^,  •••» 

Y  ,  i  =  1,  . . . ,  k}  of  the  dependent  variable  together  with  the 
lni 

corresponding  values  (x^,  i  =  1,  ....  k)  of  the  independent  variable 
or  it  may  consist  of  a  bivariate  sample  t  (X  ^ ,  Y^)  ,  i  =  1,  ...»  n)  . 

Brown  and  Mood  (1950)  propose  a  nonparametric  technique  to 
estimate  the  median  regression  line  for  the  model  median  (Y ( X )  = 
a  +  BX  based  on  examining  the  residuals  of  the  regression.  They 
assume  that  for  the  bivariate  sample  {(X^,  Y^),  1  -  1>  • ...  n)  the 
errors  Y  -  a  -  BX^  have  the  same  distribution  for  all  X.  They  then 
estimate  the  median  of  the  distribution  of  Y  given  X  by 

a  +  0  X  , 

where 

median  (Y^  -  a  -  BX^)  =  0  X^  <_  median(X)  , 

median  (Y^  -  a  -  BX^)  *  0  X^  >  median(X)  . 

In  words  they  split  the  sample  into  two  subsamples  of  size  n^  and 
n2»  n^  =  n/2,  and  then  graphically  find  estimates  a,  B  so  that  the 
median  of  the  residuals  Y^  -  a  -  BX^  is  zero  for  each  batch. 

Hogg  (1975)  modifies  the  technique  of  Brown  and  Mood  in  a 
natural  way  to  estimate  the  pth  percentile  of  the  Y's.  Hog^s  model  is 

Y  *  /\(p)  +  B (p)X 
P 
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where  Y^  denotes  the  pth  percentile  of  the  Y  observations  for  a 
particular  value  of  X.  In  terms  of  the  quantile  function  we  can 
write 


Qy|x(p)  =  A(p)  +  B(p)x 

where  Q^|^(»)  is  the  quantile  function  of  Y  given  X  and  A(p)  and 
B(p)  are  unknown  constants  which  may  vary  with  p„  By  examining 
the  signs  of  the  residuals,  Hogg  estimates  the  regression  line  of 
the  pth  percentile  so  that  a  fraction  p  of  the  data  points  are  below 
the  regression  line.  Hogg  proposes  statistics  for  testing 


H  :  Qv|  (u)  =  A  (u)  +  B  (u)  X 
o  Y| X  o  o 

where  A  (u)  and  B  (u)  are  specified  based  on  the  binomial  distribution 
o  o 

of  the  number  of  observations  below  the  hypothesized  regression  line. 
Several  alternative  procedures  for  splitting  the  data  into  more  than 
two  groups  are  also  proposed  . 

Griffiths  and  Willcox  (1978)  assume  the  model 


IXjx 


(u)-U, 


Q  (>’) 

o 


for  all  X  , 


where 


p  =  a  +  B  X 

X  n  p 

o  =  a  +  B  X  . 

X  a  a 


i*. 
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This  is  equivalent  to  assuming 

Qy|x(u)  =  A(u)  +  B(u)X  , 

i.e.  a  linear  regression  model,  where 

A(u)  =  ct^  +  aoQo(u)  , 

B  (u)  =  6  +  6  Q  (u) 

Vi  no 

They  then  estimate  a^,  By,  01^,6^  using  iterative  maximum  likelihood 
procedures  on  the  weighted  residuals 

iqy|x<u>  -  <»„  +  V),/(“o  +  Bcx)  • 

A  A  A  A 

By  weighting  the  estimates  ct^,  B^,  a^,  using  their  estimated  vari¬ 
ance  matrix,  point  estimates  or  interval  estimates  of  A(u)  and  B(u) 
can  be  computed.  The  authors  use  Q^(u)  =  4>  1 (u) .  The  likelihood 
equations  do  not  have  a  closed-form  solution, and  Griffiths  and  Willcox 
use  linear  programming  methods  to  determine  optimal  values  for  the 
parameters.  They  state  that  an  advantage  in  using  a  parametric  model 
for  the  data  is  a  gain  in  precision  and  efficiency. 

In  the  next  section  we  describe  a  parametric  approach  to  solving 
the  k  sample  quantile  regression  problem  based  on  a  quantile  function 
approach.  The  procedure  is  more  general  than  that  of  Griffiths  and 
Willcox  yet  incorporates  parametric  assumptions  which  give  it  certain 
advantages  over  the  nonparametr ic  technique  of  ';ogg. 


« 
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5.2  A  Quantile  Function  Procedure  for  K  Sample  Quantile  Regression 


In  this  section  we  generalize  the  model  of  Griffiths  and  Willcox 

(1978)  to  allow  any  Q  (u) .  We  assume 

o 

Qy  |x(u)  =  px  +  °X 

or  more  specifically 

Qi(u)  =  Pi  +  oi  Qq(u)  »  •••*  k  »  (5.2.1) 

where 


p  =  et  +  0  X 
i  M  u  i 


a  =  a  +  8  X. 
1  cs  o  i 


i  =  1, 


O  •  O  | 


(5.2.2) 

k  . 


Using  the  estimates  p  ,  a ^  based  on  LCDS  (Section  4),  we  obtain 

estimates  of  8  ,  a^,  8q  using  generalized  least  squares. 

The  first  step  is  to  identify  the  Q  function  common  to  all  k 

o 

populations  using  the  techniques  of  Section  3.1  and  3.3  or,  if 

appropriate,  to  estimate  t lie  shape  parameter  y  of  the  specified 

Q  *  using  the  techniques  of  Section  3.2  and  3.3. 
o 

From  each  of  the  random  samples  {Y  ,  ...,  Y  ,  i=l,  ...,k) 

II  in, 

/*>  1 

one  forms  estimates  p  and  of  p^  and  using  optimal  or  asympto¬ 
tically  optimal  i.COS. 
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Theorem  5.2.1;  Assuming  that  the  standard  conditions  for  the  validity 
of  the  Cramer-Rao  bounds  are  satisfied  and  that  the  spacings 
{  0,  . ur,  1  }  satisfy 


max 

j 


Vi>  - 


0  as  r 


then 


\t  N2  (02,  o2!*"1  (0)  )  as  r 


CO 


where  p,  a  are  computed  using  the  optimal  or  asymptotically  optimal 
LCOS  based  on  r  order  statistics  and  1(0)  is  the  Fisher  information 
matrix  of  (p,  a)  defined  by  (4.1.6). 


Proof:  (nonrigorous) 


The  asymptotic  zero  mean  normality  of  /n  ([  1  -  [  Jj 

U  \JI 

follows  from  Theorem  2.1.4  and  the  fact  that  /u  \  are  ABLUE's  for  /p 


(a) 


Q 


The  variance  of  /p\  is  given  by 


G) 


u  G  r  •’ 


,!* n  Ki2 


*12  K22. 


-1 
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(see  Ogawa  1951  or  Eubank  1979)  where  K^,  Kj2»  and  K22  are  defined 
by  (4.1.1),  (4.1.2),  and  (4.1.3).  It  is  sufficient  to  show  that 


12  22 


|“K  K^2  converges  componentwise  to  1(6)  as  r 

Lk,„  Ko0_ 


Consider  defined  by 


k  .  T  «oW-wvi»2 

1  U.  -  u.  1 

J  3-1 


Using  the  Mean  Value  Theorem, 


foQo(y  =  WV^  +  (uj"uJ-l)foQo,(uj*) 


so  that 


f  Q„(u4)  ~  f  Q  (u.  ,)  f  _  , ,  .. 

o  o  J  o  o  1-1  =  f  Q  (u.*)  , 

- - 0  0  j 


Uj  '  UJ-1 


where  u.  ,  <  u.  <u  ,u„=0,u.,  l  .Then  we  can  write  the 
j-1  “  .1  -  J  0  r+1 

Riemann  integral 


ln<v 


/  (f  Q  ’(u))2  du 
0  °  0 


1  /  \2 
lim  (f  (L'  (u,*))  (*j 

r  k»  j  =  1  >  °  ^  f  j 


V’ 


lu<»)  -  II.  T  W"i>~  fol>o<“|-l)>2  . 

r  -*«  i=l  Uj  -  Uj_j 
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i 

; 

The  convergence  of  K22  to  1^(0)  and  K12  t0  I12^0^  follow 
analogously. 

Remarks  on  Theorem  5.2.1: 

1)  A  rigorous  proof  would  entail  examining  the  limits  n  -*•  ® 
and  r  ■+co  more  closely  as  well  as  the  asymptotic  normality 

A  A 

of  (u ,  a )  (see  Chernoff,  Gastwirth,  and  Johns  1967,  and 

Stigler  1974).  Theorem  2.1.4  holds  for  fixed  u, ,  ...»  u 

1  r 

but  here  we  let  r  -*». 

2)  It  seems  clear  that  the  asymptotically  optimal  spacings 
generated  by  Eubank  (1979)  satisfy  the  conditions  of 
Theorem  5.2.1  since  H(u)  and  H  ^(u)  are  both  defined  on 
[0,1],  One  should  substitute  H  ^(^^1  for  u^  in  the 
expressions  for  K^,  K^2,  and  K22  t0  8et  tlle  variance  °E 
Eubank's  estimators. 

3)  It  is  not  clear  that  the  optimal  spacings  of  Ogawa  (1950) 
satisfy  the  conditions  of  Theorem  5.2.1.  Examination  of 
the  ARE  of  the  estimators  of  u  and  o  using  Ogawa's  formula 

A  A 

tion  suggest  that  ARE(y,  o)->-  1  as  r  -*■<«.  This  seems  to 
indicate  that  the  conditions  on  the  spacings  are  satisfied 

Corollary  ^.2.1:  When  there  are  k  independent  samples,  the 
estimators  and  based  on  ECUS  from  a  random  sample  of 
size  from  population  i  satisfy 
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Mlo1  ~\oJ  4  H2«2’  °i  I~1(®))  as  r 


and  the  k  distributions  are  independent. 


Theorem  5.2.2:  a)  When  there  are  k  populations  satisfying  the 

A  A  A  A 

model  of  (5.2.1)  and  (5.2.2),  then  ABLUE’s  (a,  8,  a,  6)  of 

p  p  o  a 

(ct  ,  B  ,  a  ,6  )  are  given  by 
p  p  a  o 


rk  n.  ~l  -1  f  k  n.  i-  A  \“1  l 
(0)Q  1  —  W  -  .1  „©  1(9)  A  L 

il  V-i-i  L2  WJ 


(5.2.3) 


where  1(0)  is  the  Fisher  information  matrix  of  (p,a)  defined  by 

(4.1.6),  I  is  the  n  dimensional  identity  matrix,  and  the  Kronecker 
n 

product,  A@B  (Rao  1973,  p.  29),  of  an  n  x  m  matrix  A  =  (A^ )  and 

an  r  x  s  matrix  B  =  (B. .)  is  the  nr  x  ms  matrix 

ij 


a, ,  B  a. „  B  ...  a.  B 
11  12  lm 


A  0B  = 


La  .  B  ...  a  B 
nl  ntn 


T1  X,  "1 

w  =  1 

1  Lx.  X?_ 


i  S  1  j  •  •  •  )  k  • 


» 
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Using  the  asymptotic  normality  of  /  Li  1  (Theorem  5.2.1),  the 


observations  /  Li  \  approximately  fit  the  framework  of  the  Gauss- 
\°i  / 

Markov  Theorem  (see  Rao  1973, pp. 544-546)  and  we  can  form  the  ABLUE's 

A  A  A  A 

(a,  8,  a,  8)  of  (a,  6,  a,  8)  given  by  (5.2.3)  which  have 
p  y  a  o  p  u  a  a 

asymptotic  variance  given  by  (5.2.4). 


Remarks  on  Theorem  5.2.2: 

1)  Notice  that  o2  is  an  unknown  parameter  so  that  usually 
an  iterative  estimation  scheme  is  in  order.  However, 
analogous  to  the  treatment  of  a2  in  the  continuous  para¬ 
meter  time  series  regression  model  of  (4.2.2),  we  can 
treat  a2  as  the  scale  parameter  of  a  Brownian  bridge 
process  (see  Parzen  1979a).  Hence  under  the  assumption 
of  (5.2.1),  i.e.  that  Q^(u)  is  a  location-scale  shift  of 
some  Qq(u),  an  "independent"  estimator  of  is  provided 

by  o  where 
°i 

1 

o  =  /  f  Q  (u)q  (u)du  . 
oi  0  o'o  i 

Ttiis  is  the  k-sample  analog  of  o  defined  in  Section  3.1. 

o 

The  estimator  is  consistent  for  o^  when  (5.2.1)  is  true. 
Consequently  we  compute  the  estimators 
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l^iB) 


(5.2.5) 

with  estimated  variance 


V  = 


r1(o>  0  ( 


k 

T. 

i=l 


-1 

) 


(5.2.6) 


3)  Programs  to  implement  the  estimation  techniques  are  avail¬ 
able.  Subroutine  KSAM  forms  quantile-box  plots  and  uses 
the  goodness-of-fit  techniques  of  Section  3.3  to  determine 
the  distributional  shape  of  the  k  samples.  Subroutines 

A  A 

QTOLS,  QTOLSC,  and  QTOLSW  compute  estimates  and  of 

Vi.  and  a ^  for  a  specified  Qq  function  using  LCOS.  Subroutine 

LSTOAB  estimates  the  coefficients  (a  ,  8  ,  a  ,  8  )  and  their 

P  Vi  o  0 

variance  using  (5.2.5)  and  (5.2.6)  based  on  the  k  pairs  of 

A  A 

observations  (u^,  o^),  i  =  1,  ...»  k.  Listings  of  the 
subroutines  are  on  file  at  the  Institute  of  Statistics, 

Texas  A6.M  University. 

4)  Model  (5.2.2)  has  been  used  for  simplicity.  A  general 
parametric  model  relating  and  to  is 


“i  ■  f„  'W 


a  =  f  (X.,0  )  ,  i  -  1 . k  . 

i  a  i 


L 
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Scatter  diagrams  of  y^  and  vs  will  help  determine 
appropriate  f  (•  ,  •)  and  fQ  (•  ,  •)  functions. 

The  final  step  in  the  k-sample  quantile  regression  problem  is 
to  estimate  the  parameters  A(u)  and  B(u)  in 

Q^(u)  =  A(u)  +  B(u)  ,  i  =  1,  ...,  k  . 

By  making  the  substitution 

A(u)  =  a  +  a  Q  (u)  > 

y  oo 

B(u)  =  8  +  8  Q  (u)  , 

y  a  o 

we  obtain  the  estimators 

A  A  A 

A(u)  =  ot  +  Qq(u)  » 

(5.2.7) 

A  A  A 

B(u)  =  B  +  S  Q  (u)  • 

y  o  o 

A  significant  advantage  in  this  estimation  scheme  over  other  methods 
is  that  one  need  not  use  sophisticated  methods  to  estimate  A(u)  and 
B(u)  for  each  value  of  u  for  which  a  regression  line  is  desired.  Ore 
can  simply  substitute  the  appropriate  value  of  Qq(u)  in  (5.2.7). 

Hypothesis  testing  procedures: 

The  first  hypothesis  of  interest  is 

H  :  Q,(u)  =  y  +  o  Q  (u)  ,  i  =  1,  ...»  k  . 

o  i  i  i  o 

Hypothesis  examines  the  adequacy  of  the  model  Q^(u)  -  y^+o^QQ(u). 


To  test  this  hypothesis  we  propose  the  GOF  procedures  outlined  in 
Section  3.3. 

One  might  also  wish  to  examine  the  adequacy  of  the  linear  model 

A  A 

(5.2.2)  for  and  o^.  Scatter  diagrams  of  p^  vs.  X^  and  vs. 
provide  a  quick  graphic  technique  to  check  the  linear  relationship 
between  the  estimated  parameters  and  the  X  values. 

A  hypothesis  which  states  that  there  is  no  linear  relationship 
between  the  quantiles  of  X  and  X  is 


H  :  0  =0  =0  . 

o  u  o 


To  test  this  hypothesis  one  could  use  the  joint  asymptotic  distri- 

A  A 

bution  of  0^  and  0^  given  by  Theorem  5.2.2  and  form  the  test 
statistic 


2  A  *  /N  *  »  _i 

X  -  (Bu.  eo)I».r(6li,  80» 


S\  *  - 

where  Var(0^,  0q)  consists  of  the  appropriate  elements  of  the 

2 

estimated  variance  matrix  given  by  (5.2.6).  Under  H  ,  X  has  an 
2 

asymptotic  x  distribution  with  two  degrees  of  freedom.  Large  values 
2 

of  X  indicate  departure  from  11  . 

o 

Other  hypotheses  involving  a  ,0  ,<*^,0^  can  be  tested  using 

A  A  A  A 

the  asymptotic  normality  of  (a  .  0  ,  u  ,  0  ) .  Some  of  these 

n  M  o  o 


hypotheses  are  discussed  in  the  next  section 
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5.3  The  K-Saraple  Comparison  Problem 


The  k-sample  comparison  problem  is  defined  to  be  the  estimation 

\  ) 

and  comparison  of  the  location  and/or  scale  parameters  of  k  popula¬ 


tions  based  on  samples  {Y...,  . . . ,  Y.  ,  i  =  1, 

il  in^ 

assumes 


, ,  k}.  Usually  one 


F- (y)  =  f  (  — J 

l  o  0  . 


or 


Q i ( u)  =  +  °i  Qq(u)  »  1  =  1 . k  * 


where  and  are  the  location  and  scale  parameters  respectively 
of  the  ith  population  and  and  Qq  are  completely  specified.  There 
are  a  multitude  of  parametric  and  nonparametric  procedures  available 
to  compare  the  p^'s  or  the  o^'s. 

If  one  records  some  numerical  characteristic,  X^,  of  the  ith 
population,  e.g.  treatment  level,  one  can  specify  a  relationship 
between  (p^,  o  )  and  X^  such  as 


P =  a  +6  X, 
i  u  Pi 


a.  =  u  +8  X . 
i  o  ai 


(5.3.1) 


Thus  the  estimation  procedures  of  Section  5.2  are  also  appropriate 
for  a  particular  type  of  location  and  scale  comparison  problem. 


A  hypothesis  which  examines  the  equality  of  the  k  location 


parameters  is 


H  : 
M 


h,  = 


Under  the  model  (5.3.1)  this  hypothesis  is  equivalent  to 


H  :  8  =0  . 

u  u 

In  Section  5.2  we  state  the  asymptotic  variance  of  ft  from  which  we 
can  form  the  test  statistic 

zu  =  V(Var(V>* 

A  A 

where  Var(B^)  is  the  appropriate  element  of  (5.2.6).  Under  , 
has  an  asymptotic  N(0,  1)  distribution.  For  the  alternative 
H  :  ft  ?  0  ,  one  rejects  H  at  level  a  if  I  z  I  >  ^(1  -  a/2). 

The  test  is  not  appropriate,  however,  for  the  general  alternative 


H  :  not  all  p,  are  equal, 
a  i 

A  hypothesis  which  states  the  equality  of  the  k  scale 
parameters  is 


1 


=  o 


k 


which  under  model  (5.3.1)  is  equivalent  to 


H  :  ft 
a 


o 


0 


we  can  form  the  test 


Analogous  to  the  test  statistic  for  Hy  , 

statistic,  z  ,  for  H  defined  by 
'a  o 

2„  -  '<v“<b0>>4 

which  has  an  asymptotic  N(0,  1)  distribution  under  H  . 

a 

One  might  also  wish  to  test  simultaneously  the  equality  of 
the  y^'s  and  the  a^'s.,  i«e.  test 

H  :  y,  =  ...*=  y.  and  a,  =  ...  =  a, 
o  1  k  1  k 

or 


H  :  6  =  =  0  . 

op  a 


To  test  this  hypothesis  use  the  test  statistic  X  of  Section  5.2 
defined  by 

'l\ 


X2  -  <Su,  8„)l£r 


e. 


which  under  H  has  an  asymptotic  x  distribution  with  two  degrees 
of  freedom. 

The  advantages  of  this  comparison  procedure  are: 


1)  There*  are  no  restrictions  on  Q  (e.g.  Q  =  <t>  ^). 

2)  One  can  accommodate  heterogeneity  of  the  o^'s  (or  y^'s) 
when  comparing  the  y^'s  (or  o^’s).  The  standard  ANOVA 
procedure  for  the  comparison  of  location  parameters 


* 
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assumes  o  =*  ...  =o  ,  and  Q  =  <fr  ^ .  The  resulting  F 
1  k  o 

statistic  performs  fairly  well  when  either  of  the 
assumptions  are  violated  but  its  performance  worsens 
when  both  assumptions  are  violated  particularly  if  the 
u.'s  vary  considerably  (Box  1954)  , 

3)  In  many  situations  (e.g.  the  Weibull  distribution)  the 

location  parameter  is  a  threshhold  value  and  not  a  measure 
of  central  tendency.  The  mean  and  median  of  the  distri¬ 
bution  will  depend  on  scale  and  shape  parameters  as  well 
as  the  location  parameter.  Thus  a  procedure  based  on 
comparing  sample  means  or  medians  seems  inappropriate 
for  comparing  location  parameters. 

A  substantial  disadvantage  of  the  procedure  is  that  location 
and  scale  comparisons  based  on  the  model  of  (5.3.1)  have  very  low 
power  against  general  alternatives. 
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6.  EXAMPLES 

In  this  section  two  published  data  sets  are  analyzed  using  the 
techniques  of  Sections  3  through  5.  The  results  of  the  analyses  are 
compared  to  results  of  other  investigations  of  the  data.  Computing 
programs  to  implement  the  techniques  of  data  analysis  described  in 
this  dissertation  have  been  developed  by  the  author.  The  programs 
make  use  of  subroutines  which  implement  the  nonparametric  data 
modeling  techniques  of  Parzen  (1979).  All  computing  was  performed 
on  an  AMDAHL  470V /6  computer  at  Texas  A&M  University. 

6.1  Professors'  Salary  Example 

Hogg  (1975),  Griffiths  and  Willcox  (1978),  and  Angers  (1979) 
investigate  data  consisting  of  the  salaries  of  96  professors  at  a 
major  university  as  a  function  of  their  years  in  service.  Each  of 
the  investigators  estimate  linear  percentile  lines  for  p  “  .25,  .50, 
.75.  The  techniques  of  Hogg  (1975)  and  of  Griffiths  and  Willcox 
(1978)  have  been  described  in  Section  5.1.  The  approach  of  Angers 
(1979)  is  to  use  grafted  polynomials,  a  nonparametric  technique, 
where  the  curves  for  the  75th  and  25th  percentiles  are  restricted  to 
be  symmetric  about  the  curve  for  the  50th  percentile.  He  uses 
linear  percentile  regression  curves.  Table  6.1  summarizes  the 
estimated  quantile  regression  coefficients  obtained  by  each  author. 


E 
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Table  6.1  Estimated  Parameters  for  Professors'  Salary  Data. 


A(.25) 

B(.25) 

A(.  50) 

B(.50) 

A(. 75) 

B(.75) 

Hogg(1975) 

18.8 

.300 

20.0 

.485 

21.5 

.625 

Griffiths  & 

Willcox (19 79) 

17.50 

.40 

19.15 

.48 

20.81 

.56 

Angers (1979) 

18.173 

.331 

19.646 

.478 

21.119 

.625 

The  data  are  presented  in  Figure  6. A. 

Griffiths  and  Willcox  state: 

"There  is  no  clear  evidence  in  the  data  of  departures  from 
normality  by  way  of  either  highly  skewed  or  heavily  tailed 
residual  distributions.  There  is,  however,  a  trend  to 
increasing  spread  ..." 

However,  generally  with  salary  data  one  would  expect  the  data  to 
be  skewed  right  particularly  when  there  are  few  years  in  service. 
There  seem  to  be  several  outliers  in  the  data  when  there  are  few 
years  in  service.  While  one  might  expect  increasing  spread  of  the 
salary  distribution  a;  years  in  service  increases,  the  increase  in 
spread  evidenced  by  this  data  does  not  seem  substantial. 

What  one  would  like  to  detect  is  how  the  quantiles  behave  as 
a  function  of  years  in  service.  One  would  like  to  determine  wnich 
of  the  potential  curves  in  Figure  6.B  represent'!  the  relationship 
between  salary  quantiles  and  years  in  service  and  to  estimate 
the  unknown  parameters  of  the  quantile  regression  function. 


i 


Figure  6.B  Possible  Salary  Quantile  Regression  Curves 


Since  the  sample  sizes  are  quite  small  for  each  value  of  X, 
in  order  to  use  our  quantile  regression  technique,  it  is  necessary 
to  repartition  the  data  by  pooling  homogeneous  samples.  Tukey  (1977) 
suggests  that  one  way  to  partition  Y  observations  when  X  is  a  random 
variable  is  to  use  selected  quantiles  of  X.  In  this  study  three 
methods  of  partitioning  the  data  were  investigated: 

1)  pooling  the  data  into  four  year  intervals; 

2)  pool  the  data  into  five  year  intervals; 

3)  pool  the  data  using  similar  midrange  values  which  resulted 
in  five  samples  representing  3,  3,  4,  4  and  6  years  of 
service  respectively. 

It  was  found  that  pooling  in  four  samples  each  representing  five 
years  of  service,  was  most  satisfactory  for  this  study.  It  seems  that 
there  is  a  jump  in  salary  a  ter  five  years  of  service.  Another 
method  of  partitioning  the  data  is  to  pool  the  data  into  overlapping 
samples.  However  this  technique  violates  the  assumption  of  k 
independent  samples. 

Based  on  pooling  the  data  into  four  samples  of  five  years  each, 
we  shall  describe  each  stage  of  the  analysi  .  We  use  the  mean  value 
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of  X  within  each  sample  as  the  value  of  X^.  Thus  *  3,  X^  =  8, 

X,  =  13,  X.  =  18. 

3  4 

Stage  1:  The  quantile-box  plots  of  all  four  samples  are  given 

in  Figure  6.C.  The  shift  in  location  is  very  evident.  The  shapes 

of  the  distributions  seem  compatible  but  all  of  the  plots  show 

varying  degrees  of  skewness.  However  all  of  the  sample  sizes  are 

relatively  mall  and  it  is  difficult  to  identify  incompatible 

shapes  from  the  quantile-box  plots.  The  plots  do  suggest  that  one 

should  test  the  goodness-of-f it  of  the  data  to  symmetric  and  slightly 

skewed  Q  functions, 
o 

Using  the  technique  of  Section  3.3  we  test  the  goodness-of-f it 
of  the  data  to  the  normal,  logistic,  and  Weibull  (y  ■  .333,  .250,  .20) 
distributions. 

By  specifying  Qq(u)  =  ♦  ^(u)  (normal  distribution)  we  obtain 

the  quantile-box  plot  of  Figure  6.D  for  the  pooled  transformed  data. 

The  plot  is  not  incompatible  with  a  normal  shape  except  in  the  tails. 

Figure  6.E  is  a  plot  of  D(u),  the  raw  transformation  distribution 

function.  The  line  D(u)  =  u  has  also  been  superimposed  on  the 

figure.  Serious  departures  from  the  line  D(u)  =  u  are  not  obvious. 

The  value  of  p(l)  is  .0206.  Under  Hq,  2np(v),  v  ^  0  has  an 
2 

asymptotic  x  distribution  with  two  degrees  of  freedom.  The  .05 

2 

critical  value  of  a  X2  5.99.  The  value  of  2np(l)  is  3.95  where 
n  a  96  and  p (1)  *  .0206.  This  is  also  evidence  that  the  normal 
distribution  is  compatible  with  the  data.  Finally  CAT  selects  an 


Q(u) 
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optimal  order  of  zero  which  is  consistent  with  the  other  diagnostics 
in  failing  to  reject  a  normal  distribution  for  the  data.  Based  on 
this  stage  of  analysis  we  conclude  that  Q^(u)  =  +  o^  4>  ^(u)  , 

i  =  1,  2,  3,  4.  Using  a  consensus  of  the  diagnostics,  the  other 
distributions,  i.e.  logistic  and  Weibull  (y  =  .333,  .250,  .20)  , 
are  not  as  compatible  with  the  data  as  the  normal  distribution. 


Stage  2:  We  compute  estimates  y^,  of  y^,  using  LCOS 
based  on  the  normal  distribution  using  the  asymptotically  optimal 
coefficients  and  spacings  (r  =  7)  of  Eubank  (1979).  Figure  6.F 

A  A 

plots  y^  and  vs  X^.  The  figure  suggests  a  linear  model  for  the 

*  A 

y^'s  but  it  appears  that  there  is  no  definite  trend  for  the  o^  s. 
However  we  shall  attempt  to  fit  the  linear  model  of  (5.2.1)  for 
both  and  In  Stage  3. 

Stage  3:  We  use  generalized  least  squares  to  obtain  the 
following  fitted  regression  lines  for  y^  and 

y  ±  =  19.8642  +  .5024  Xi 


a  t  -  1.3914  +  .0363  XJ[ 


Since  we  suspected  that  does  not  change  significantly  with  X^» 

wc  test  H  :  p  =0  giving  z  =  .8117.  Based  on  this  value  we  fail 

it  <i  a 

to  reject  at  the  a  =  .05  level  and  conclude  that  does  not 
have  a  significant  linear  effect  on  a^.  A  test  of  H^:  8^  =  0  gives 
z  =  7.943  and  we  reject  H  at  the  .05  level. 


92 


Stage  4:  Based  on  the  results  of  Stage  3  and  analogous 
with  previous  investigations  of  the  data  we  estimate  A(u)  and  B(u) 
for  u  “  .25,  .50,  and  .75.  The  estimated  values  are 

A( . 25)  =  18.926  , 

B(.25)  =  .466  , 

A ( . 50)  =  19.864  , 

B(.50)  -  .502  , 

A(. 75)  =  20.802  , 

B( . 75)  =  .527  . 

These  values  are  comparable  to  those  of  Table  6.1. 

While  we  reached  the  same  general  conclusions  as  the  other 
investigations,  our  technique  has  severa]  distinct  advantages  over 
the  other  procedures: 

1)  The  procedure  is  flexible  enough  to  incorporate  virtually 
any  specified  distributional  shape.  Griffiths  and  Willcox 
(1978)  only  use  the  normal  distribution.  Angers  (1979) 
and  Hogg  (1975)  use  nonparametric  methods. 

2)  The  procedure  is  computationally  simple.  Both  Griffiths 
and  Willcox  (1978)  and  Angers  (1979)  use  techniques  that 
are  fairly  complicated  and  involve  iterative  solutions. 
Hogg's  (1975)  technique  is  graphical  and  seems  very 
subjective  in  all  phases  of  estimation. 

3)  The  procedure  uses  simple  well-known  procedures  for 


hypothesis  testing. 
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One  should  note  the  danger  of  trying  to  predict  salaries  at  other 
universities  or  outside  the  range  of  years  of  service  using  the 
results  of  this  analysis.  As  the  years  of  service  increase,  the 
quantile  curves  will  undoubtedly  flatten. 

6.2  Green  Sunfish  Example 

Matis  and  Wehrly  (1979)  illustrate  compartmental  modeling 
techniques  using  data  from  a  study  to  investigate  the  resistance 
of  the  green  sunfish,  lepomis  cyanellus,  to  various  levels  of 
thermal  pollution.  The  data  consist  of  the  time  until  death  (Y) 
of  samples  of  fish  exposed  to  water  heated  to  a  range  of  sub-lethal 
and  lethal  temperature  (X).  As  part  of  their  analysis  Matis  and 
Wehrly  utilize  samples  at  the  temperature  levels  of  39.5°C  and 
39.7°C.  They  model  the  time  until  death  as  a  three-parameter 
Weibull  distribution  and  estimate  all  three  parameters  for  each 
sample.  LaRiccia  (1979)  uses  the  temperature  levels  39.5°C,  39.6°C, 
and  39.7°C  and  using  a  Weibull  model,  he  estimates  all  three  parameters 
for  each  sample  using  minimum  quantile  distance  estimators.  The 
estimates  of  Matis  and  Wehrly  (1979)  and  those  of  LaRiccia  (1979) 
using  r  =  6  quantiles  are  summarized  in  Table  6.2.  LaRiccia  (1979) 
states  that  for  the  temperature  levels  of  39.5°C  and  39.7°C  the  data 
fits  well  a  Weibull  distribution  with  the  estimated  parameters  but 
that  the  estimated  parameter  values  for  a  temperature  of  39.6°C  are 


unreal istic 
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For  this  study  we  use  the  ten  (k  =  10)  temperature  levels 
38.9°C,  39 . 0°C ,  39 . 3°C  (,1°C)  40.0°C  and  model  each  of  the  ten 
populations  as  a  location-scale  shift  of  a  Weibull  distribution  with 
a  common  but  unknown  shape  parameter.  While  this  is  a  reasonable 
model  for  a  time  until  failure  distribution,  it  should  be  noted 
that  in  the  ichthyological  literature  tolerance  times  of  fish  are 
often  assumed  to  have  a  lognormal  distribution. 


Table  6. 

2  Estimated 

Parameters  for 

Green  Sunfish  Data 

A 

V 

a 

A 

0  -  (i, 

a.  39.5°C 

Mat is  and 

Wehrly (1979) 

96.  min 

1.00* 

3029. 

LaRiccia(1979) 

135.37 

79.46 

1.15 

b.  39.6°C 

LaRiccia(1979) 

91.3 

4.96  x  105 

1.70  x  104 

c.  39.7°C 

Mat is  and 

Wehrly (1979) 

35.  min 

.599* 

2.486 

LaRiccia(1979) 

48.83 

48.58 

1.46 

*the  scale 

k  =  (l/o)c 

parameter  Mat  is  and  Wehrly 

A 

.  The  value  a  is  obtained 

(1979)  estimate  is 

by  o  -  (k)“1/Y 
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The  reasonability  of  our  distributional  assumptions  is  investigated 
in  Stage  1  below.  Our  goals  in  this  study  are  twofold: 

1)  to  investigate  if  there  is  a  significant  difference  in 
the  location  and  scale  parameters  of  the  time  until 
failure  distributions  for  these  temperature  levels. 

2)  to  estimate  quantile  regression  lines  for  u  =  .50  and 
.90  (i.e.  for  the  50th  and  90th  percentiles  of  the  time 
until  failure  distributions). 

The  sample  sizes  are  n.  =  20  for  i=l,  ....  7,  n_  =  11, 

i  o 

n^  =  n^0  =  10.  Figure  6.G  presents  the  data  plotted  as  a  function 
of  temperature  level.  The  four  stages  of  analysis  are  described 
below: 


Stage  1:  The  quantile-box  plots  of  all  ten  samples  are 

given  in  Figure  6.H.  The  shift  in  location  is  evident  but  is  not 

uniform  for  all  the  temperature  levels.  The  decreasing  spread  as 

temperature  increases  is  apparent  by  examining  Q^(.75)  -  Q^(.25). 

The  plots  of  Q^(u)  seem  fairly  symmetric  except  for  i  =  3(X^  =  39.3°C), 

i  =  6(X6  =  39.6°C) ,  and  i  =  9(Xg  =  39.9°C).  For  i  =  3,  6  and  8, 

Q^(u)  is  slightly  skewed  left  and  for  i  =  2  and  9,  Q^(u)  is  skewed 

right.  The  plots  suggest  that  a  potential  set  of  values  of  the 

shape  parameter  might  be  in  the  range  (.5,  .2). 

Using  the  estimator  y  of  y  defined  by  (3.2.6)  and  using 

P 

=  .0002,  u^  =  .0115,  u_j  =  .5429  which  are  optimal  for  y  =  .3, 

we  obtain  the  estimate  y  =  . 

P 


416. 


temper 


99 


Using  the  technique  of  Section  3.3  we  test  the  goodness-of-f it  of 

the  data  to  the  Weibull  distribution  for  y  =  .333,  .25,  .20,  .167, 

o 

and  .143  using  the  model 

Y 

Qi (u)  =  ui  +  o±  [-log(l-u)]  °  ,  i  =  1,  ....  10  . 

By  specifying  Yq  =  .333,  we  obtain  the  quantile-box  plot  of 

Figure  6.1  for  the  pooled  transformed  data.  We  are  testing  whether 

the  pooled  transformed  data  fits  an  exponential  distribution  and 

the  plot  is  not  incompatible  with  an  exponential  shape  except  for 

the  two  outliers  in  the  right  tail.  Figure  6.J  is  a  plot  of  D(u), 

the  raw  transformation  distribution  function  with  the  line  D(u)  =  u 

superimposed  on  the  plot.  Serious  departures  of  D(u)  from  the  line 

D(u)  =  u  are  not  evident  except  as  u  gets  close  to  1.  The  value  of 

p(2)  is  .0097  so  that  comparing  2n  p (2) (=3.317  where  n  =  171)  to 

2 

the  .05  critical  value  of  (=5.99)  yields  further  evidence  for 
failing  to  reject  Yq  =  .333  as  the  true  value  of  y.  Finally  CAT 
selects  an  optimal  order  of  zero  which  is  consistent  with  the  other 
diagnostics  in  accepting  Yq  =  .333  as  an  appropriate  value  of  y. 

The  values  Yq  =  .25  and  .20  do  not  prove  to  be  acceptable 
values  of  Y  based  on  a  consensus  of  the  diagnostics  from  the  0NESAM 
analysis  of  the  pooled  transformed  data.  Based  on  this  stage  of 
Liu’  analysis  we  conclude  that 

<?,(<«)  =  Uj  +  o  i — i og ( i  -  u) r  m  ,  i  =  i,  ....  io  . 
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Stage  2;  We  compute  estimates  of  <j^  using  LCOS 

based  on  the  Weibull  distribution  with  y  m  .333  using  the  optimal 
coefficients  and  spacings  (r  =  6)  of  Hassanein  (1971).  Figure 

A  A 

6.K  plots  Uj  and  vs  X^.  The  figure  suggests  a  linear  model  for 

A  a 

or  and  the  presence  of  a  linear  trend  for  y^.  It  should  be  noted 

A 

that  y^  is  less  than  min  (Y^,  j  =1,  . ..,  n^)  for  all  i  which  is 

A 

desirable.  However  the  y^'s  vacillate  so  that  a  uniform  decrease 
in  the  threshhold  of  the  tolerance  times  as  the  temperature  increases 
is  not  evident.  The  failure  to  detect  a  uniform  trend  is  attribu¬ 
table  to  competing  physiological  causes  of  death  in  the  specified 
temperature  range. 

We  can  compare  our  estimates  of  y,  a»  and  y(*l/c)  to  those  of 
Table  6.2  for  1  *  5,  6,  7  (39.5°C,  39.6°C,  39.7°C).  Our  values  are 
summarized  in  Table  6.3.  The  value  y  =  .333  which  we  used  is  not 

A 

consistent  with  the  estimate,  c,  of  LaRiccia  but  is  consistent  with 
that  of  Matis  and  Wehrly  for  the  temperature  39.7°C. 


Table  6.3. 

Estimated  Parameters 

of  Green  Sunfish  Data,  i  =  5,6,7 

A 

"i 

A 

°i 

A  A 

Yi  YP 

Yo 

i  =  5,  39.5 

109.319 

95.899 

.206  .416 

.333 

i  =  6,  39.6 

63.926 

91.753 

.416  .416 

.333 

i  -  7,  39.7 

29.708 

66.847 

.4)6  .416 

.333 

We  shall 

attempt  to 

fit  the  linear  model  of  (5.2. 

1)  for  both 

y^  and  in  Stage  3 
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Stage  3:  We  use  generalized  least  squares  to  obtain  the 
following  fitted  regression  lines  for  and 

=  4601  -  114.25  XA 

o±  =  6707  -  167.188Xi  . 

The  asymptotic  variances  of  a  and  a  are  very  large.  One  solution 

P  a 

to  this  might  be  to  rescale  the  X  values  by  subtracting  median  (X^) 
from  each  one.  If  we  let  X^*  =  X  -  39.55,  we  get  the  regression 
lines 

p,  =  89.560  -  124.185X  * 

i  i 

ot  =  102.0931  -  181.777Xt* 

Testing  H  :  g  =  0,  we  get  z  =>9.31  and  consequently  we  reject  H  . 

p  p  p  u 

Testing  H  :  g  =  0,  we  get  z  =  12.48  and  we  also  reject  H  .  The 
o  o  a  a 

hypothesis  is  equivalent  to  the  hypothesis  that  all  a^'s  are 
equal  and  is  equivalent  to  the  hypothesis  that  all  0^'s  are  equal. 
Thus  for  the  k  sample  comparison  of  location  and  scale  parameters  we 
conclude  that  the  p^'s  are  significantly  different  as  are  the  o^'s. 

Stage  4;  Based  on  the  results  of  Stage  3  we  estimate  A(u) 
and  B(u)  for  u  *  .50  and  ,90.  The  resulting  quantile  regression 


Qi(.50)  -  160.326  -  250.183  X^ 
Q1(.90)  -  324.638  -  542.742  X^ 


lines  are 
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These  lines  are  drawn  on  Figure  6.G. 

The  estimated  quantile  regression  lines  seem  fairly  reasonable. 
Better  knowledge  of  the  physiological  effects  of  thermal  pollution 
should  lead  to  a  better  range  of  temperature  levels  where  one  effect 
is  the  dominant  cause  of  death.  Larger  sample  sizes  will  result  in 
a  better  estimation  of  y  and  of  and  which  will  improve  estimates 
of  the  quantile  regression  line.  We  are  convinced  that  this  technique 
of  quantile  regression  is  appropriate  and  very  useful  for  analyzing 
this  type  of  data. 
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7.  CONCLUSIONS 


7.1  Summary: 

In  this  dissertation  we  have  investigated  a  quantile  function 
approach  to  the  k  sample  quantile  regression  problem.  By  modeling 
the  quantile  functions  of  the  k  populations  as  location-scale  shifts 
of  a  completely  specified  quantile  function,  Qq,  and  then  modeling 
the  relationship  of  the  location  and  scale  parameters  and  to 
a  predictor  variable  X^,  four  stages  in  the  analysis  have  been 
delineated. 

Stage  1,  the  identification  of  Qq,  is  discussed  in  Section  3. 
Multiple  quantile-box  plots  are  used  as  a  quick  graphic  technique 
to  identify  the  qualitative  characteristics,  e.g.  skewness,  symmetry, 
modality,  and  tail  behavior  of  the  distribution  of  each  population. 
Parzen's  (1979)  data  modeling  technique  for  one  population  is 
described  and  extended  to  a  goodness-of-f it  procedure  for  k  popula¬ 
tions.  An  estimator,  y,  of  the  shape  parameter,  y,  of  Qq  is 
given  and  is  shown  to  have  an  asymptotic  normal  distribution. 

Optimal  spacings  for  y  when  corresponds  to  the  Weibull  distribution 
are  given. 

Section  4  describes  Stage  2,  the  estimation  of  location  and 
scale  parameters  using  k  independent  samples  of  data.  Two  approaches 
to  selecting  optimal  linear  combinations  of  order  statistics  for  one 
population  are  discussed  and  shown  to  provide  computationally  simple 
and  statistically  efficient  estimators  of  the  location  and  scale 
parameters  of  k  populations.  A  study  of  bias,  variance  and  mean 
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squared  error  of  estimators  based  on  a  misspecif ied  value  of  the 
shape  parameter  of  the  Weihull  distribution  shows  that  mild  mis¬ 
specif  ication  of  Q  does  not  seriously  affect  the  estimation  of 
‘  o 

location  and  scale  parameters. 

Stage  3,  the  estimation  of  the  parameters  of  a  linear  regres¬ 
sion  model  for  and  0^,  is  discussed  in  the  first  part  of  Section 
5.  The  estimated  parameters  and  their  joint  asymptotic  normality 
are  based  on  the  generalized  least  squares  technique.  The  model 
used  for  the  k  sample  quantile  regression  is  flexible  in  that  it 
accommodates  almost  any  specification  of  yet  leads  to  simple 
estimators  of  the  regression  parameters. 

Stage  4  is  the  estimation  of  and  inference  about  quantile 
regression  curves.  The  estimation  technique  is  simple^and  contrary 
to  many  existing  techniques,  one  can  estimate  regression  curves  for 
several  quantiles  without  having  to  reestimate  the  regression  para¬ 
meters.  Inference  about  the  curves  is  based  on  the  asymptotic 
normality  of  the  estimated  parameters. 

Finally  in  Section  6,  the  technique  is  illustrated  using  two 
data  sets.  In  both  cases  an  appropriate  specification  of  is 
made  and  the  estimated  quantile  regression  curves  fit  the  data  well. 
The  results  are  consistent  with  those  of  previous  investigators. 

The  two  analyses  illustrate  the  flexibility  and  simplicity  of  the  k- 
sample  quantile  regression  procedure. 
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7.2  Problems  for  Further  Research 

The  most  critical  stage  of  the  analysis  as  we  perceive  the 
k  sample  quantile  regression  problem  is  the  identification  of  Q^. 
There  are  several  areas  for  future  investigation  dealing  with  this 
stage.  There  are  a  multitude  of  goodness-of-f it  procedures  for  one 
population.  The  extension  of  these  procedures  to  k  populations  and 
a  comparison  of  these  k  population  procedures  to  our  GOF  procedure 
should  be  conducted. 

The  estimator  y(and  y  )  of  the  shape  parameter  y,  is  formulated 

P 

in  general  terms.  Tables  of  optimal  values  for  u^,  u2,  and  u^  should 
be  available  for  distributions  other  than  the  Weibull,  especially 
the  lognormal  distribution.  The  use  of  this  type  of  estimator 
should  be  extended  to  the  case  of  censored  samples.  Other  methods 
of  estimating  y,  e.g.  cross-validation  techniques  (Stone  1974), 
might  prove  useful. 

Optimal  linear  combinations  of  order  statistics  yield 
estimators  of  location  and  scale  parameters  that  are  simple  to 
compute  and  statistically  efficient  but  require  tables  of  optimal 
spacings  and  coefficients.  Eubank  (1979)  suggests  the  investigation 
of  techniques  to  use  spacings  from  a  subinterval  of  [0,  1)  for 
distributions  where  the  simultaneous  estimation  of  location  and 
scale  parameters  is  not  possible  using  the  continuous  parameter 
time  series  approach.  This  would  be  useful  in  the  k  sample  quantile 
regression  problem  also.  Tables  of  optimal  spacings  and  coefficients 


i 
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for  a  wider  range  of  values  of  the  shape  parameter  of  the  Weibull 
distribution  need  to  be  made  available. 

While  this  formulation  of  the  k— sample  quantile  regression 
has  proven  its  worth,  it  would  be  worthwhile  to  investigate  a 
quantile  function  approach  to  the  k-sample  comparison  problem. 

! 

i 
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